-
**Describe the bug**
I train the model on huge intents (+11000) in Arabic, all is working great, except for the fact that the model doesn't capture false positives in high rate with also high confid…
-
We have many keywords that we would like to merge. For example:
1. Singular vs. plural: microplastic, microplastics
2. One word vs. two words: micro plastics, microplastics
3. Abbreviations: nucl…
-
Hi, many thanks to IndoBenchmark Team before, for the deployment of IndoBERT model.
I'm currently working on my thesis project, it's about sentence similarity detection which the dataset are pair of …
-
As a user i would like a porter stemmer in `nvstrings` as a important nlp pre-processing step.
Based a initial reading of the algorithm,
I feel implementing the `measure` function at a C++ l…
-
Hi,
Let's start a discussion here about the roadmap towards 0.10 and 1.0. We are looking for:
- New features that are useful to your research
- Improvements and patches to existing features
If…
-
-5/25/2020
Created the repo for the Covid19Twitter project
-
Hi,
I'm looking for a library that can do something like stemming or lemmatization for me.
Doesn't really have to be proper lemmatization. Ideally, I'm looking for some base reference form, like t…
-
I feel for some new features, we might have to perform lemmatization and POS-tagging.
In that case, it would be better to include those operations too as a part of our readability score calculatio…
-
**Describe the bug**
Is it possible to mimic reprocess=True outside simple transformers itself
Any code sample is appreciated !!!
I want to reprocess =True before feeding into model train metho…
-
Hi again,
When I create a dfm from my emails corpus, I get the error message:
> Creating a dfm from a corpus ...
> ... lowercasing
> ... tokenizing
> ... indexing documents: 1,882 documents
> …
7804j updated
3 years ago