clangen / musikcube

a cross-platform, terminal-based music player, audio engine, metadata indexer, and server in c++
https://musikcube.com
BSD 3-Clause "New" or "Revised" License
4.18k stars 295 forks source link

Search is case sensitive with Cyrillic input #613

Closed adem4ik closed 1 year ago

adem4ik commented 1 year ago

Version: 3.0.1 OS: Win 10 / Manjaro Linux

  1. Open Library > Filter
  2. Type any artist on Cyrillic starting with lower case (i.e. мотогонки)

Result: Search gives zero results Expected result: Search should show Мотогонки

изображение изображение

It works as expected with Latin input: изображение

clangen commented 1 year ago

Alright, so, we have mappings from characters to their case-insensitive counterparts... however, I'm lame and only speak English and am only able to recognize Latin-esque characters. If you can provide similar mapping for Cyrillic characters I can add them to the next release.

See here: https://github.com/clangen/musikcube/blob/master/src/musikcore/db/SqliteExtensions.cpp#L70

adem4ik commented 1 year ago

Well here is mapping. It covers Russian letters only. Also it should work for Ukrainian/Belorussian in the most cases, and it may be useful for other Cyrillic languages.

But I guess it is not possible to make similar mapping for a whole bunch of Cyrillic letters as it done for Latin. That would require ~200 strings of pairs.

а - А б - Б в - В г - Г д - Д е - Е ё - Ё ж - Ж з - З и - И й - Й к - К л - Л м - М н - Н о - О п - П р - Р с - С т - Т у - У ф - Ф х - Х ц - Ц ч - Ч ш - Ш щ - Щ ъ - Ъ ы - Ы ь - Ь э - Э ю - Ю я - Я

clangen commented 1 year ago

Apologies for the delay -- I've added a new mapping in this commit: 066b73ff32933a62ab1864813232fda257dee680

And yeah, a better way to do this would be to use ICU for all these types of mappings (https://icu.unicode.org/), but that library is larger than musikcube itself. Maybe in the future I can look into trying to load and use ICU at runtime if it already exists on the user's system. 🤔

adem4ik commented 1 year ago

OS: Manjaro Linux I've compiled from source using the latest git version (d723f963607788e2ea495f878c2cac2b171ae0a2) using this instruction: изображение Renamed ~/.config/musikcube/. Fix doesn't seem to work: изображение

p.s. also sudo make uninstall doesn't work after sudo make install.

clangen commented 1 year ago

You're right, the patch won't be quite as straight forward as I thought. Hmm... I'll try to take another swing at it.

clangen commented 1 year ago

I'm curious, does the clangen/utf8-case-insensitive-queries fix your issue?

adem4ik commented 1 year ago

@clanget Yes, it does! Well for the most cases like: изображение

But it may give some strange results in some cases. For example, only the 1st case is right here and it should give the same results for the later searches. The 3d and 4th ones are interesting, cause it found for any occurrence of ы only: i.e. МЫ & ВышелPlay. изображение

Same problem: изображение

I guess it can't handle upper case symbols properly.

clangen commented 1 year ago

So while researching this problem I happened upon the following sqlite3 extension: https://github.com/nalgeon/sqlean/tree/main/src/unicode, which is a more comprehensive version of what I'm trying to maintain.

I went ahead and integrated it into a side branch and it appears to work well for accented Latin characters, and the few non-Latin characters I understand.

@adem4ik if you get a chance, I'm curious if you could try the clangen/sqlean-unicode branch and let me know if it addresses your issues. :)

adem4ik commented 1 year ago

@clangen It seems like clangen/sqlean-unicode fixes my issue completely. Thank you!

Here are the proofs: изображение изображение изображение

clangen commented 1 year ago

Awesome, merged back to master. Thanks for testing!

clangen commented 1 year ago

Finally got around to releasing this. https://github.com/clangen/musikcube/releases/tag/3.0.2