FreeLanguageTools / vocabsieve

Simple sentence mining tool for language learning
GNU General Public License v3.0
344 stars 25 forks source link

Add simple lemmatization tweaks #121

Closed Mycheze closed 4 months ago

Mycheze commented 6 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

There are many "simple" lemmatizations that Vocabsieve doesn't find. The most common example for me (in Czech) is the colloquial -ej ending. It's a very common thing in certain dialects, but it's non-standard and doesn't appear in dictionaries. It's very annoying to have to go and change the -ej ending for -ý, look up the word with a double click, then to back and reverse the change for the actual Anki card.

Describe the solution you'd like A clear and concise description of what you want to happen.

Some way to add additional "lemmatization" steps in the language config. I would love to say (somehow) "for words that end in -ej, if there's no definition found, try looking them up with -ý instead.

Additional context 2023-12-30_12-28_1 2023-12-30_12-28 This is a common thing in Czech: link

1over137 commented 6 months ago

What if I implement some way to use custom regular expressions as a fallback lemmatizer? This will probably involve a series of textboxes of regular expression replacements. It's not the most user-friendly way to do things but a better solution will take much longer

For your example, it would be: (Replace) "ej$", "ý"

1over137 commented 4 months ago

image This is implemented as a simple regex replacement for now.

Mycheze commented 4 months ago

This is the greatest day of my life 🙏🙇