-
In some cases, the stop_words parameter of the CountVectorizer is not enough to prevent certain non-desired words from coming through. For example, one may have the desire to filter out non-verbs like…
-
Getting error while vectorizing the string
i use ->
```
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')
```
error I go…
-
After running the following code:
# Compute topics of chunks
chunk_topics, chunk_embeds, df_topics = chunker.get_chunk_topic(chunks=chunks)
I got this error:
InvalidParameterError: The 'ngram_…
-
`from TextFeatureSelection import TextFeatureSelection
#Binary classification
input_doc_list=new_df_4['txt'].values.tolist()
target=new_df_4['target'].values.tolist()
fsOBJ=TextFeatureSelection(ta…
-
Hi,
I am facing issue. If I train bertopic on a same dataset multiple times, I am getting different number of topics .
As per the discussion in this thread: https://github.com/MaartenGr/BERTop…
-
When running run_pydistinto_beginners.py, I get another error saying that the attribute "get_feature_names" of "CountVectorizer" is not found. I have Python3.10.6 on Ubuntu 22.04.2 LTS
![grafik](http…
-
I've tried but this error occurred,
`NotImplementedError: CountVectorizer cannot be converted, only tokenizer='word' is supported. You may raise an issue at https://github.com/onnx/sklearn-onnx/is…
-
```
import jieba
def tokenize_zh(text):
words = jieba.lcut(text)
words = list(filter(lambda x: (len(x)>1), words))
return words
import numpy as np
from umap import UMAP
from skle…
-
Here is an issue that I came across with Count Vectorizer and its use with Column Transformer and Pipelines
https://stackoverflow.com/questions/54541490/sklearn-text-and-numeric-features-with-colum…
-
Right now CountVectorizer sometimes sets ``self.vocabulary_`` outside of ``fit``. We usually prohibit this, but the common tests haven't reached the vectorizers yet.