Dushistov / sdcv

https://dushistov.github.io/sdcv/
GNU General Public License v2.0
289 stars 42 forks source link

Option to ignore accents for dictionary lookup #94

Open 1over137 opened 1 year ago

1over137 commented 1 year ago

Original issue on KOReader repository: https://github.com/koreader/koreader/issues/10202. KOReader developers asked that I propose the feature to this project instead.

In Russian and many other languages, words have unpredictable stress but is not normally marked, and the headwords in dictionaries are typically without accent. However, some instructive material have accent marked, and there are tools to mark them on ebooks for language learners who may not be familiar with the correct pronunciation. This poses a problem as the fuzzy matching does not appear to be able to match the unmarked version. It would be great either to let the current fuzzy matching be able to do this, or have another explicit option to ignore accent marks.

For this, please ensure that both NF(K)C and NF(K)D normalizations are considered. in NF(K)D mode the accent character is a separate character, while in NF(K)C mode the accent character is combined with the letter to have its own standalone glyph.