Composed characters and autocomplete

trondtynnol commented 4 months ago

Composed Unicode characters consisting of many code points are treated as separate characters by the autocomplete. This means that typing мо into the search bar in sanj first shows hits starting in мо̄, as the macron comes first in sorting, before any other characters. It also throws off the alignment of the macron, as the о is shown in bold types, but the macron is not.

This may be especially confusing to users as two of these long vowels, ӣ and ӯ, are represented as a single character, and thus show different behavior than the rest of the long vowels.

Desirable functionality would be to treat these compound characters like single characters, so that typing мо only shows matches with short о.

Phaqui commented 4 months ago

Fixed as of 2591ace910060bf0b10044db8b05d360e5fe967e

Phaqui commented 4 months ago

When looking at the code, https://github.com/giellatekno/neahttadigisanit/blob/main/neahtta/neahtta/nds_lexicon/lexicon.py#L472 be aware that the comment strings may look strange on github, and maybe in the editor. This line:

# makes the "о" an "о̄", which are two different characters)

For me, it gets rendered correctly (i.e. a "short o" and "long o", or o without macron and o with macron) in the editor on github, but not when viewing it.

giellatekno / neahttadigisanit

Composed characters and autocomplete #26