Open sdave-connexion opened 4 months ago
Updated the model with new documents:
That's the thing, you didn't update the model. When you use .transform
, you are merely predicting the topics of the documents that you passed to it. .transform
, like it's used in scikit-learn, it not meant to update the underlying model. Instead, if you want to update the model, I would advise using either online topic modeling or the .merge_model
technique.
@MaartenGr In my case, new data comes in every two days. So in this case I am planning to:
Is this way correct ? Or is there any other easier way ? Thanks in advance
You can only do this if step 1 was also done with online topic modeling. You cannot use .partial_fit
after .fit
at the moment. Instead, I would advise using the .merge_models
technique to iteratively combine new models.
Have you searched existing issues? 🔎
Desribe the bug
I have been using BERTopic for topic modelling and recently needed to update my existing BERTopic model with new documents. I want to push the updated model to the Hugging Face Hub, ensuring that it reflects the new number of documents and topics.
Here’s what I’ve done so far:
Despite following these steps, I still see the old number of training documents in the repository on the Hugging Face Hub. How can I ensure that the updated model reflects the new number of training and topics?
Any help or guidance on this would be greatly appreciated!
Reproduction
BERTopic Version
pip install -U bertopic