MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.03k stars 757 forks source link

Is there a option where I can add few topic words to topics? #1649

Open ShyamGanesh13 opened 10 months ago

ShyamGanesh13 commented 10 months ago

Hi @MaartenGr , Is there any option like I can add my own set of words to an topic generated by an BERTopic model?

Like assume, I have a 2 topics with topic_labels 1_cat_cats_paws and 2_dog_dogs_puppy generated from my dataset . Now can I add some extra words to these topics like 1_cat_cats_paws_kitten_cute and 2_dog_dogs_puppy_bark

Note :- Here, words like kitten, cute and bark are my words(not generated by the model) that I need to add in the topics already created by the BERTopic model...

MaartenGr commented 10 months ago

If you want to add words, then it might be worthwhile to do so by adding those words to topic_model.topic_representations_. That variable contains the core representations. If, however, you know these words to appear in the data, you can also seed them such that they are likely to appear in the resulting topic representations.

ShyamGanesh13 commented 10 months ago

If you want to add words, then it might be worthwhile to do so by adding those words to topic_model.topic_representations_. That variable contains the core representations.

If we are adding those words to topic_model.topicrepresentations. Is there any way to calculate the score associated with those words? 0: [['cat', 0.40245479345321655], ['cats', 0.3927491307258606], ['paws', 0.3698537349700928], ['kitten', ?], ['cute', ?]]

MaartenGr commented 10 months ago

You can extract the values from topic_model.c_tf_idf_ and use topic_model..vectorizer_model.get_feature_names_out() to find the right indices of the values.