Hi, is there something like a lemmatizer? I have a couple of tagalog sentences with translations and I am trying to lemmatize them (then do some sorting by frequency and then use it myself for languag…
wadid updated
11 months ago
Hi, Is there a way to incorporate stemming or lemmatization? The problem is, for example, while the word 'help' in a text gets counted towards category help, the words helping and helped do not.
First Thank you for all the work done, i know that FeaturizeText apply NLP preprocessing like skipword with a specifique language :
Find/Create function for sepatating text string to tokens(words).
Function must get text string and return list of string tokens. Function also **should not** return tokens
containing digits and pu…
1. Datasets limpos (sem lematização e sem remoção de SW);
2. Datasets com remoção de SW;
3. Datasets com remoção de SW e lematização (ambos usando NLTK como é feito atualmente).
In `LatinBackOffLemmatizer()` and the lemmatizers in its chain I can't seem to find an option to return an empty value (such as in `OldEnglishDictionaryLemmatizer()`'s `best_guess=False` option), inst…
Just like with spaCy (see #374), we could add an analyzer that uses the [Stanza](https://stanfordnlp.github.io/stanza/) (formerly StanfordNLP) NLP toolkit for tokenization and especially lemmatization…
For somebody not familiar with SOLR it is very hard to start using this. Would it be possible to
add an example configuration for processing a corpus where each document is just a text file for the …
- [ ] Add POS Tagging to exclude nouns from lemmatization and for better sanitization.
- [ ] Replace regular Levenshtein distance with a Levenshtein Automaton + Jaro-Winkler Distance based approa…