I have an issue when i try to fit_transform a list of 100,000 documents with countvectorizer , when I use an ngram(1,3) no memory error shows, but when I use ngram(1,2) i have this error :
Hi there!
One of the topics BERTopic extracted for me is ```2_printer_print_printing_printers```, and I was wondering, does BERTopic do some sort of lemmatization (I think that's what would help me…
We need to pick at least four feedback according to which we can make improvements on our project. Let's discuss!
For those also searching the issues for lemmatization, this code seems to work
# Lemmatization
from sklearn.feature_extraction.text import CountVectorizer
import nltk
1. Should stop words be removed from corpus beforehand? My topic_model generates clusters with most frequent words like "the", "and", "to" and etc.
2. Is there any model to process long text withou…
Opening this issue to discuss which plots are necessary or what should be changed to show that our data is appropriate for our analysis/prediction.
We should also discuss whether the quantiles used…
Can someone please let me know how can i get rid of this error. I tried installing torch==1.9.0 and torch==1.8.0 but none of them work.
ImportError: cannot import name 'SAVE_STATE_WARNING' from 'to…
I'm having this error while trying to minimize -1 topic by fiddling around hdbscan parameters
101099it [17:03:36, 1.65it/s]
2021-10-10 09:25:15,138 - BERTopic - Transformed documents …
I closed the repository and created a virtualenv environment for it and did the pip install -r requirements. Now when starting the server on windows 10 pro, I get the following error:
#### Description
When performing FastICA using whiten=True attribute, the resulted unmixed signals have a variance of 1/len(data). this can be handled by multiplying the unmixed signals by …