MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.12k stars 763 forks source link

AttributeError: 'NoneType' object has no attribute 'strip' #1921

Closed ytpf23 closed 6 months ago

ytpf23 commented 6 months ago

Getting the following error when using OpanAI representation model with Bertopic. When in logs one and the same cluster is visible two times, like here, cluster number 143 first time passes and later sends the error.

def setup_openai_client(): client = AzureOpenAI( api_key=Params.openai_key, api_version=Params.openai_version, azure_endpoint= Params.openai_endpoint ) prompt=bert_topic_label_prompt return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, exponential_backoff=True)

024-04-10 10:55:23 - httpx - INFO - HTTP Request: POST https://openaigpt4access.openai.azure.com//openai/deployments/gpt3_5azure/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK" 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 143/195 [01:02<00:22, 2.34it/s]2024-04-10 10:55:23 - httpx - INFO - HTTP Request: POST https://openaigpt4access.openai.azure.com//openai/deployments/gpt3_5azure/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK" 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 143/195 [01:02<00:22, 2.29it/s] Traceback (most recent call last): File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in main(args.project_id, args.n_reviews, args.category) File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models) File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model topics, probs = topic_model.fit_transform(sentences, embeddings) File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 433, in fit_transform self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose) File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 3637, in _extract_topics self.topicrepresentations = self._extract_words_per_topic(words, documents) File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 3938, in _extract_words_per_topic self.topicaspects[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects) File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation_openai.py", line 222, in extract_topics label = response.choices[0].message.content.strip().replace("topic: ", "") AttributeError: 'NoneType' object has no attribute 'strip'

MaartenGr commented 6 months ago

Can you try installing BERTopic from its main branch? I believe a fix for this can be found there.

ytpf23 commented 6 months ago

Can you try installing BERTopic from its main branch? I believe a fix for this can be found there

Error is still there, I have clone the master branch

MaartenGr commented 6 months ago

Could you share the full code and error message after cloning and installing the branch?

ytpf23 commented 6 months ago

def initialize_representation_models(): keybert_model = KeyBERTInspired() openai_model = setup_openai_client() return { "KeyBERT": keybert_model, "OpenAI": openai_model,
}

def setup_openai_client(): client = AzureOpenAI( api_key=Params.openai_key, api_version=Params.openai_version, azure_endpoint= Params.openai_endpoint ) prompt=bert_topic_label_prompt return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, delay_in_seconds=0.3, diversity=0.2) #exponential_backoff=True,

def fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_model): #embedding_model, topic_model = BERTopic( embedding_model=embedding_model, umap_model=umap_model, hdbscan_model=hdbscan_model, vectorizer_model=vectorizer_model, representation_model=representation_model, top_n_words=20, verbose=True )

topics, probs = topic_model.fit_transform(sentences, embeddings)
topics_df = topic_model.get_topic_info()
print(f"Number of unique topics found: {len(set(topics))}")
return topics_df


    2024-04-10 12:19:42,781 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-04-10 12:20:36,721 - BERTopic - Dimensionality - Completed ✓
2024-04-10 12:20:36,721 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-10 12:20:39,694 - BERTopic - Cluster - Completed ✓
2024-04-10 12:20:39,700 - BERTopic - Representation - Extracting topics from clusters using representation models.
 77%|██████████████████████████████████████████████████████████████████████████████████████████████████▎                            | 151/195 [01:16<00:22,  1.97it/s]
Traceback (most recent call last):
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in <module>
    main(args.project_id, args.n_reviews, args.category)
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main
    auto_topics = final_auto_topics(project_id=project_id, n_reviews=n_reviews)
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics
    topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models)
  File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model
    topics, probs = topic_model.fit_transform(sentences, embeddings)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 433, in fit_transform
    self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 3782, in _extract_topics
    self.topic_representations_ = self._extract_words_per_topic(words, documents)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 4083, in _extract_words_per_topic
    self.topic_aspects_[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
  File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation\_openai.py", line 223, in extract_topics
    label = response.choices[0].message.content.strip().replace("topic: ", "")
AttributeError: 'NoneType' object has no attribute 'strip'
(dictionary) PS C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code> 
ytpf23 commented 6 months ago

Do you have any suggestion?

MaartenGr commented 6 months ago

I'm actually not sure what is happening here. I believe OpenAI should give back at least some value, especially when you check for it. It might be that OpenAI has some additional filters and does not accept certain input/output if it doesn't adhere to their guidelines.

One other thing that I can think of is that their API changed a while ago. Are you using the latest version of their package?

ytpf23 commented 6 months ago

Yes, I have implemented a custom solution and the problem is policy violation.

ERROR MESSAGE: Value Error 'Azure has not provided the response due to a content filter being triggered'

However, in bertopic I just don't get the error message it returns None and execution stops.

I will use my custom implementation to catch these errors, but I think many people may have this issue in the future

Thank you for your reply!