-
**Describe the bug**
NLP clustering does not work properly. The code available in [example](https://docs.rapids.ai/api/cuml/stable/api/#cuml.dask.naive_bayes.MultinomialNB) works fine for classificat…
-
For text mining it's important to fit also a CountVectorizer (or a TFIDFTransformer), so should be possible to export it in the targhet lenguage
-
The existing CountVectorizer code has jit things such as in the forward function
```
doc_ids = torch.jit.annotate(List[Tensor], []) # noqa: F821
```
which we need to do a bit of a work around s…
ksaur updated
3 years ago
-
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
Why am i getting this error, min_df = 2 and everything e…
-
Hi. I am new to using GPU. I am working on adversarial machine learning and earlier I have used the Textattack library for one of my projects using Sklearn and Keras models. For that I created the cus…
-
**Describe the bug**
Upon using the code provided to fit a `CountVectorizer` on a given text series, it causes an error to pop up where the lengths of the calculated vocabulary and document frequenci…
-
**Is your feature request related to a problem? Please describe.**
Now that we have a distributed hashing vectorizer we should also have a distributed count vectorizer. This is especially useful fo…
-
# Word2vector
## 0. Preparation
1. raw text
```python
text = """
稀疏矩阵是由大部分为零的矩阵组成的矩阵,
这是和稠密矩阵有所区别的主要特点。
"""
# must wrap row by enter \n, or else…
-
I received an error messge of "CountVectorizer' object has no attribute 'stop_words_" when using CountVectorizer with vocabulary. From sklearn tutorial the stop_words attrubute will not be available w…
-
I was mainly reading your medium post. I tried to code accordingly but faced an error: "CountVectorizer: Vocabulary wasn't fitted". As I was using the saved model.
You skipped the part that we also h…