MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.19k stars 765 forks source link

Zero-shot predefined Topics #2164

Open mahmawad opened 1 month ago

mahmawad commented 1 month ago

Hi, Thanks again for your great tool,

I have a question regarding predefined Topics, whenver I add a list of zeroshot_topic_list, I got different generated topics and not the one I added, is there a way to only do topicmodeling based only on these zeroshot_topic_list ?

Code :

from bertopic import BERTopic
# Initialize and train BERTopic model
topic_model = BERTopic(
    embedding_model=embedding_model,
    vectorizer_model=vectorizer_model,
    umap_model=umap_model,
    calculate_probabilities=True,
    #hdbscan_model=hdbscan_model,
    representation_model=representation_model,
    verbose=True,
    nr_topics=15,
    min_topic_size=25,
    zeroshot_topic_list=zeroshot_topic_list,
        zeroshot_min_similarity=.85

)

# Fit the topic model and transform the data
topics, probs = topic_model.fit_transform(df['PreprocessedText'].values)
MaartenGr commented 1 month ago

Yes, you only need to set zeroshot_min_similarity to 0 and it will select topics only from zeroshot_topic_list.