MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.19k stars 765 forks source link

AttributeError: 'OpenAIBackend' object has no attribute 'encode' #2136

Open mahmawad opened 2 months ago

mahmawad commented 2 months ago

Hi, I still get this error message despite I use the last commit & version from bertopic, could you please help ?

Code:

import openai from bertopic.backend import OpenAIBackend from openai import AzureOpenAI client = AzureOpenAI( api_version = "2023-09-15-preview", api_key =os.getenv("OPENAI_API_KEY").strip(), azure_endpoint ="https://x-.openai.azure.com/", ) embedding_model = OpenAIBackend(client,"text-embedding-3-large")

embeddings = embedding_model.encode(df['PreprocessedText'].tolist(), show_progress_bar=True)

from bertopic import BERTopic

Initialize and train BERTopic model

topic_model = BERTopic( embedding_model=embedding_model, vectorizer_model=vectorizer_model, umap_model=umap_model, calculate_probabilities=True,

hdbscan_model=hdbscan_model,

representation_model=representation_model,
verbose=True,
nr_topics=10

)

Fit the topic model and transform the data

topics, probs = topic_model.fit_transform(df['PreprocessedText'].values)

MaartenGr commented 2 months ago

The encode method that you refer to is specific to sentence-transformers package. You would have to use embed instead. I would advise checking out the source code and see which methods are available: https://github.com/MaartenGr/BERTopic/blob/master/bertopic/backend/_openai.py