Closed ColinFerguson closed 3 years ago
You are correct! Stupid overview on my part not actually using the n_gram_range
. Same with stopwords.
Master has the most up-to-date version. Pypi was updated to 0.2.3 to include the changes you proposed. Let me know if you find any other issues!
Great thank you so much @MaartenGr
Hi, really nice work with this package, it's very useful.
Model initiation takes the arguement
n_gram_range
, but I think that it doesn't get used. Should line 241 referenced here becount = CountVectorizer(ngram_range=n_gram_range, stop_words="english").fit(documents)
?https://github.com/MaartenGr/BERTopic/blob/9f7dca1103e1935f7a2779d1fa9e89db072c0c8a/bertopic/model.py#L241
It might be nice to have the
stop_words
argument be configurable at initiation as well, so that the user could pass a corpus-specific set of stop words.