HugoFara / lwt

Learn languages by reading! A language learning app stemmed from Learning with Texts (LWT).
https://hugofara.github.io/lwt/
The Unlicense
168 stars 19 forks source link

Improving suggested words with alternate characters #136

Open HenryWales opened 1 year ago

HenryWales commented 1 year ago

This is an advanced feature request.

There are many languages where a similar word (different tense, different number, different grammatical category, etc. but same root) won't have a perfect string match, but instead will have some common character substitutions (a short string of characters replaced by another short string of characters).

For example in English, drink, drunk.

The idea would need a language-specific list of common substitutions and a parsing algorithm that can check substitutions. The substitution would still rank lower than a continuous word, of course.

A difficulty is the lack of an objective measure for the relevance of suggestions. There are some substitutions with very high relevance in some languages, but others are less obvious, so creating a parsing list for a language, tweaking how much a substitution penalizes ranking and measuring the end effect have some degree of subjectivity.

Such a solution also probably require some caching, otherwise loading time for suggested words could be problematic.

I would be ready to help, but I can understand if this is not a priority.