Closed adem4ik closed 1 year ago
Alright, so, we have mappings from characters to their case-insensitive counterparts... however, I'm lame and only speak English and am only able to recognize Latin-esque characters. If you can provide similar mapping for Cyrillic characters I can add them to the next release.
See here: https://github.com/clangen/musikcube/blob/master/src/musikcore/db/SqliteExtensions.cpp#L70
Well here is mapping. It covers Russian letters only. Also it should work for Ukrainian/Belorussian in the most cases, and it may be useful for other Cyrillic languages.
But I guess it is not possible to make similar mapping for a whole bunch of Cyrillic letters as it done for Latin. That would require ~200 strings of pairs.
а - А б - Б в - В г - Г д - Д е - Е ё - Ё ж - Ж з - З и - И й - Й к - К л - Л м - М н - Н о - О п - П р - Р с - С т - Т у - У ф - Ф х - Х ц - Ц ч - Ч ш - Ш щ - Щ ъ - Ъ ы - Ы ь - Ь э - Э ю - Ю я - Я
Apologies for the delay -- I've added a new mapping in this commit: 066b73ff32933a62ab1864813232fda257dee680
And yeah, a better way to do this would be to use ICU for all these types of mappings (https://icu.unicode.org/), but that library is larger than musikcube itself. Maybe in the future I can look into trying to load and use ICU at runtime if it already exists on the user's system. 🤔
OS: Manjaro Linux I've compiled from source using the latest git version (d723f963607788e2ea495f878c2cac2b171ae0a2) using this instruction: Renamed ~/.config/musikcube/. Fix doesn't seem to work:
p.s. also sudo make uninstall
doesn't work after sudo make install
.
You're right, the patch won't be quite as straight forward as I thought. Hmm... I'll try to take another swing at it.
I'm curious, does the clangen/utf8-case-insensitive-queries
fix your issue?
@clanget Yes, it does! Well for the most cases like:
But it may give some strange results in some cases. For example, only the 1st case is right here and it should give the same results for the later searches. The 3d and 4th ones are interesting, cause it found for any occurrence of ы
only: i.e. МЫ
& ВышелPlay
.
Same problem:
I guess it can't handle upper case symbols properly.
So while researching this problem I happened upon the following sqlite3 extension: https://github.com/nalgeon/sqlean/tree/main/src/unicode, which is a more comprehensive version of what I'm trying to maintain.
I went ahead and integrated it into a side branch and it appears to work well for accented Latin characters, and the few non-Latin characters I understand.
@adem4ik if you get a chance, I'm curious if you could try the clangen/sqlean-unicode
branch and let me know if it addresses your issues. :)
@clangen It seems like clangen/sqlean-unicode
fixes my issue completely. Thank you!
Here are the proofs:
Awesome, merged back to master
. Thanks for testing!
Finally got around to releasing this. https://github.com/clangen/musikcube/releases/tag/3.0.2
Version: 3.0.1 OS: Win 10 / Manjaro Linux
мотогонки
)Result: Search gives zero results Expected result: Search should show
Мотогонки
It works as expected with Latin input: