Open levrone1987 opened 1 year ago
Could this issue be caused by top2vec finding 0 topics?
@Lotfi-AL I cannot check the number of topics, because the error is already is in the line where Top2Vec object is created. This is the full error message:
INFO:top2vec:Pre-processing documents for training
/home/oem/anaconda3/envs/news-env/lib/python3.8/site-packages/sklearn/feature_extraction/text.py:528: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None'
warnings.warn(
Traceback (most recent call last):
File "/home/oem/news_recos/main.py", line 179, in <module>
top2vec_model = Top2Vec(corpus, speed="learn", workers=8, embedding_model='distiluse-base-multilingual-cased')
File "/home/oem/anaconda3/envs/news-env/lib/python3.8/site-packages/top2vec/Top2Vec.py", line 587, in __init__
raise ValueError(f"A min_count of {min_count} results in "
ValueError: A min_count of 50 results in all words being ignored, choose a lower value.
If the above cannot be resolved, I would appreciate a sample code for processing a corpus of text written in German.
I would really appreciate if someone could answer the question I stated above. @Lotfi-AL @ddangelov
This is likely due to your dataset being too small. Set min_count==0
and also try using a larger dataset.
Given a text corpus (German language), I get the following error with the code shown below:
raise ValueError(f"A min_count of {min_count} results in " ValueError: A min_count of 50 results in all words being ignored, choose a lower value.
The code:
I had to change
vectorizer.get_feature_names()
tovectorizer.get_feature_names_out()
in Top2Vec.py in order to avoid the error associated with the missingget_feature_names
method, but now I experience the above error.