MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.03k stars 756 forks source link

Reducing Outliers of Loaded Model #1944

Open alegallo1511 opened 5 months ago

alegallo1511 commented 5 months ago

Hi!

A month ago I created a topic model and saved it as follows: topic_model.save(outpath, serialization="safetensors").

I then reduced the outliers in the model, new_topics = topic_model.reduce_outliers(docs, topics), and used it in an empirical analysis, but I did not save the model with the updated topics.

I now want to produce visualizations of the topics used in the analysis so I have loaded my dataframe (and defined docs again), loaded the model and tried to reduce the outliers again, but I get an error and I am not sure how to fix it. The code and error are below:

loaded_model = BERTopic.load("Only-English-BERT-topic-meaning-min-size-50")
topics = loaded_model.topics_
new_topics = loaded_model.reduce_outliers(docs, topics)

_sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided__

I have also tried using

`topics, probs = loaded_model.transform(docs)

, but I got the same error.

Any help in how to fix this would be greatly appreciated.

Thanks in advance for your time!

MaartenGr commented 5 months ago

Which version of BERTopic are you using? Was it the same as when you saved the model?

Also, could you provide the full error message. It's not clear to me what the error is referencing.