Closed CharlesOkwuagwu closed 11 months ago
To start off, you should pass the documents as a list of strings and not a pandas series. Second, are you absolutely sure that the documents are all strings? BERTopic expects strings since we are doing topic modeling in textual data. Any values, such as floats should be removed.
Lastly, I highly recommend taking a look at the best practices and the guide on running with large datasets.
Hi, I have good anticipation for topic modeling with Bertopic.
However, my trial with live data has resulted in errors. I did not apply any filtering to the data, as you recommended.
The data sample is as follows:
data:
My code:
Errors:
Do you still recommend not cleaning the data (all text-based entries from customers)?
Also, I have an NVidea 3800Ti which is taking over 2 hours, I have no idea if my GPU is being used. I'm running this locally
Thanks!