Closed ytpf23 closed 6 months ago
Can you try installing BERTopic from its main branch? I believe a fix for this can be found there.
Can you try installing BERTopic from its main branch? I believe a fix for this can be found there
Error is still there, I have clone the master branch
Could you share the full code and error message after cloning and installing the branch?
def initialize_representation_models():
keybert_model = KeyBERTInspired()
openai_model = setup_openai_client()
return {
"KeyBERT": keybert_model,
"OpenAI": openai_model,
}
def setup_openai_client(): client = AzureOpenAI( api_key=Params.openai_key, api_version=Params.openai_version, azure_endpoint= Params.openai_endpoint ) prompt=bert_topic_label_prompt return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, delay_in_seconds=0.3, diversity=0.2) #exponential_backoff=True,
def fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_model): #embedding_model, topic_model = BERTopic( embedding_model=embedding_model, umap_model=umap_model, hdbscan_model=hdbscan_model, vectorizer_model=vectorizer_model, representation_model=representation_model, top_n_words=20, verbose=True )
topics, probs = topic_model.fit_transform(sentences, embeddings)
topics_df = topic_model.get_topic_info()
print(f"Number of unique topics found: {len(set(topics))}")
return topics_df
2024-04-10 12:19:42,781 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-04-10 12:20:36,721 - BERTopic - Dimensionality - Completed ✓
2024-04-10 12:20:36,721 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-10 12:20:39,694 - BERTopic - Cluster - Completed ✓
2024-04-10 12:20:39,700 - BERTopic - Representation - Extracting topics from clusters using representation models.
77%|██████████████████████████████████████████████████████████████████████████████████████████████████▎ | 151/195 [01:16<00:22, 1.97it/s]
Traceback (most recent call last):
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in <module>
main(args.project_id, args.n_reviews, args.category)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main
auto_topics = final_auto_topics(project_id=project_id, n_reviews=n_reviews)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics
topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model
topics, probs = topic_model.fit_transform(sentences, embeddings)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 433, in fit_transform
self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 3782, in _extract_topics
self.topic_representations_ = self._extract_words_per_topic(words, documents)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\_bertopic.py", line 4083, in _extract_words_per_topic
self.topic_aspects_[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation\_openai.py", line 223, in extract_topics
label = response.choices[0].message.content.strip().replace("topic: ", "")
AttributeError: 'NoneType' object has no attribute 'strip'
(dictionary) PS C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code>
Do you have any suggestion?
I'm actually not sure what is happening here. I believe OpenAI should give back at least some value, especially when you check for it. It might be that OpenAI has some additional filters and does not accept certain input/output if it doesn't adhere to their guidelines.
One other thing that I can think of is that their API changed a while ago. Are you using the latest version of their package?
Yes, I have implemented a custom solution and the problem is policy violation.
ERROR MESSAGE: Value Error 'Azure has not provided the response due to a content filter being triggered'
However, in bertopic I just don't get the error message it returns None and execution stops.
I will use my custom implementation to catch these errors, but I think many people may have this issue in the future
Thank you for your reply!
Getting the following error when using OpanAI representation model with Bertopic. When in logs one and the same cluster is visible two times, like here, cluster number 143 first time passes and later sends the error.
def setup_openai_client(): client = AzureOpenAI( api_key=Params.openai_key, api_version=Params.openai_version, azure_endpoint= Params.openai_endpoint ) prompt=bert_topic_label_prompt return OpenAI(client, model=Params.openai_deployment_gpt3, chat=True, prompt=prompt, exponential_backoff=True)
024-04-10 10:55:23 - httpx - INFO - HTTP Request: POST https://openaigpt4access.openai.azure.com//openai/deployments/gpt3_5azure/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK" 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 143/195 [01:02<00:22, 2.34it/s]2024-04-10 10:55:23 - httpx - INFO - HTTP Request: POST https://openaigpt4access.openai.azure.com//openai/deployments/gpt3_5azure/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK" 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 143/195 [01:02<00:22, 2.29it/s] Traceback (most recent call last): File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 94, in
main(args.project_id, args.n_reviews, args.category)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\main.py", line 23, in main
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 101, in final_auto_topics
topics_df = fit_bertopic_model(sentences, embeddings, embedding_model, umap_model, hdbscan_model, vectorizer_model, representation_models)
File "C:\Users\y.tautkevychius\Documents\insight-ai-review-dictionary\code\analysis\keywords_modeling.py", line 72, in fit_bertopic_model
topics, probs = topic_model.fit_transform(sentences, embeddings)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 433, in fit_transform
self._extract_topics(documents, embeddings=embeddings, verbose=self.verbose)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 3637, in _extract_topics
self.topicrepresentations = self._extract_words_per_topic(words, documents)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic_bertopic.py", line 3938, in _extract_words_per_topic
self.topicaspects[aspect] = aspect_model.extract_topics(self, documents, c_tf_idf, aspects)
File "C:\Users\y.tautkevychius\Anconda_new\envs\dictionary\lib\site-packages\bertopic\representation_openai.py", line 222, in extract_topics
label = response.choices[0].message.content.strip().replace("topic: ", "")
AttributeError: 'NoneType' object has no attribute 'strip'