-
Hi, is there something like a lemmatizer? I have a couple of tagalog sentences with translations and I am trying to lemmatize them (then do some sorting by frequency and then use it myself for languag…
wadid updated
11 months ago
-
Hi, Is there a way to incorporate stemming or lemmatization? The problem is, for example, while the word 'help' in a text gets counted towards category help, the words helping and helped do not.
An…
-
-
Hi
First Thank you for all the work done, i know that FeaturizeText apply NLP preprocessing like skipword with a specifique language :
![image](https://user-images.githubusercontent.com/16559628/86…
-
Find/Create function for sepatating text string to tokens(words).
Function must get text string and return list of string tokens. Function also **should not** return tokens
containing digits and pu…
-
Versões:
1. Datasets limpos (sem lematização e sem remoção de SW);
2. Datasets com remoção de SW;
3. Datasets com remoção de SW e lematização (ambos usando NLTK como é feito atualmente).
[Trea…
-
In `LatinBackOffLemmatizer()` and the lemmatizers in its chain I can't seem to find an option to return an empty value (such as in `OldEnglishDictionaryLemmatizer()`'s `best_guess=False` option), inst…
-
Just like with spaCy (see #374), we could add an analyzer that uses the [Stanza](https://stanfordnlp.github.io/stanza/) (formerly StanfordNLP) NLP toolkit for tokenization and especially lemmatization…
-
For somebody not familiar with SOLR it is very hard to start using this. Would it be possible to
add an example configuration for processing a corpus where each document is just a text file for the …
-
- [ ] Add POS Tagging to exclude nouns from lemmatization and for better sanitization.
- [ ] Replace regular Levenshtein distance with a Levenshtein Automaton + Jaro-Winkler Distance based approa…