-
## why?
proximity and order matters when searching for idioms.
the one which is ranked higher should be ranked lower than the one below|
--- |
|
## how?
Are there any functions which ince…
-
This project is very interesting, i am wondering what i need to do to add an additional language to it, in my case i want to use it for Finnish.
Maybe, if we have a list of task needed for a new l…
-
I've got `fzf` setup and it seems to work. The selection based on filenames without preview is not very useful though when using non-title filenames. It would also be great if the preview window to th…
-
Looking for ambiguous lemmas, I came across `enkelt`, but I am wondering if the ambiguity is just a typo. There are 14 examples of `enkelt` in the training data, 8 of which have `enkelt` as the lemma…
-
Performing data preprocessing
- [x] Tokenization
- [x] Stemming
- [x] Lemmatisation
-
_Version: 1.10.0_
It is not possible to extend match unit with altered forms
https://github.com/translate/translate/blob/master/translate/search/match.py#L276#L279
and using binary search in unit …
ta2-1 updated
7 years ago
-
I think i have found a bug in the swedish stemmer. When searching for "mötet" (the meeting) i should get result for "möte" and "möten". I think the problem is when stemming words ending with "et". (wo…
-
When lemmatization is explained, we say:
"Unlike stemming, lemmatization can consider the context and part of speech of the word, which can make it more accurate and reliable."
But, in the code,…
-
It is not obvious how pronouns should be lemmatized (cf. #276 for Slavic). The [UD_English](/UniversalDependencies/UD_English) corpus does the following:
Nominative (`PRP`):
I -> I
you …
-