Open sdave-connexion opened 1 year ago
By "run the analysis" you mean calling the fit
method? By "run it through the model" you mean calling the transform
method? Unless I misunderstand something, isn't that an obvious solution right there?
@clstaudt Hi, I've used topic modeling to categorize a dataset of 100,000 feedbacks into 20 main topics. As new data arrives—around 5,000 feedbacks each week—I want to efficiently categorize this fresh feedback into those pre-established 20 topics. I'm looking for an approach where I can apply the already trained model to these new entries without having to re-run the model on the entire, ever-growing dataset.
What's the best way to achieve this incremental categorization?
for example - more or less, what I'd like to do is akin to how in supervised learning we can use a trained model to predict labels for new, unseen data.
I believe BERTopic works out of the box just like other ML models that you familiar with:
fit
and validate the topicstransform
method on the new batch -> the result is an assignment of each new feedback to one (or several) of the pre-established topics (or the outlier topic)Indeed! After having created your model with fit_transform
or fit
you can simply run transform
to do this:
Hello, I have created my model but every week I don't want to run the whole analysis again, I would like to only take the new feedback that I receive and run through the model and feedback gets assigned to my current clusters directly.
Hello, I have created my model but every week I don't want to run the whole analysis again, I would like to only take the new feedback that I receive and run through the model and feedback gets assigned to my current clusters directly.
I looked at online topic modelling but it doesn't work after the .fit part.
Thank you in advance
Best, Shantanu