TWiStErRob / net.twisterrob.inventory

Magic Home Inventory https://play.google.com/store/apps/details?id=net.twisterrob.inventory
https://www.twisterrob.net/project/inventory/
20 stars 1 forks source link

International search #170

Open TWiStErRob opened 4 years ago

TWiStErRob commented 4 years ago

Cyrillic characters are not searchable because they're probably tokenized wrong by FTS.

The "unicode61" tokenizer is available beginning with SQLite version 3.7.13 (2012-06-11). -- https://www.sqlite.org/fts3.html#tokenizer

And has become the default in Android 5.0 (API 21) SQLite 3.8.6:

The unicode61 tokenizer is now included in FTS4 by default. -- https://www.sqlite.org/releaselog/3_8_6.html

According to https://stackoverflow.com/a/4377116/253468

SQLite 3.8.6: 21-5.0-Lollipop SQLite 3.7.11: 19-4.4-KitKat

Not working:

TWiStErRob commented 4 years ago

Search can't seem to find words with lithuanian letters, like š, č and others. I'm guessing it's the same with all non-latin letters. -- https://mail.google.com/mail/u/0/#inbox/FMfcgxvzLDrxJNnThJwBqKTBjnmPVVKk

TWiStErRob commented 4 years ago

No russian search, but not a problem for me. -- ~https://play.google.com/apps/publish/?account=7995455198986011414#ReviewDetailsPlace:p=net.twisterrob.inventory&reviewid=gp:AOqpTOH7m8Z6aykf4DWNSBXq3UMxTrUFFnjnlTAK5PpHuSkjZZXtdSxGcYCidZ8LwKajmUHp1uWTzD1qTI7Z~ -- https://play.google.com/console/u/0/developers/7995455198986011414/app/4974852622245161228/user-feedback/review-details?reviewId=faf7bd15-8f8a-4411-b3de-07d75337177b&corpus=PUBLIC_REVIEWS

TWiStErRob commented 4 years ago

Very poor search. -- ~https://play.google.com/apps/publish?account=7995455198986011414#ReviewDetailsPlace:p=net.twisterrob.inventory&reviewid=gp:AOqpTOF3WefYqI2eT9CLmGG6nIb3HK27wUuHC2aO-ZUWYdLrRGZzutY5hPvpergQPzwt0Jy0an4JuMX-dw2V~ -- https://play.google.com/console/u/0/developers/7995455198986011414/app/4974852622245161228/user-feedback/review-details?reviewId=49bfe869-37e5-4aa3-aa55-d2253a57a4e1&corpus=PUBLIC_REVIEWS

Probably Cyrillic based on device language and user's name.

razumeiko commented 1 year ago

Hi @TWiStErRob . Thanks for the app, it's really great! Just curious, are you planning to fix cyrillic search soon? Or it's not in near feature plans? This is really critical that you are unable to search in Cyrillic . Thanks!

TWiStErRob commented 1 year ago

@razumeiko I have prepared for this change for many months now, getting closer.

I'm sorry to say, but I'm disabling distribution in Cyrillic script countries until this is fixed. Because I'm getting too many bad reviews for a feature that's listed explicitly as not available in the Play Store description. Existing installs will stay. Sideloading from e.g. apkpure is still possible.

Most recent 1*:

Add app translation as well as search in other languages. -- https://play.google.com/console/u/0/developers/7995455198986011414/app/4974852622245161228/user-feedback/review-details?reviewId=08a37b19-e1e8-4c4e-b175-56ff0788ac58&corpus=PUBLIC_REVIEWS

TWiStErRob commented 1 year ago

Russian search doesn't work. https://mail.google.com/mail/u/0/#inbox/159b6b29576f11a1 / https://mail.google.com/mail/u/0/#inbox/FMfcgxmSdZQLzFHjVvnlrhthtKmTRDNw https://play.google.com/console/u/0/developers/7995455198986011414/app/4974852622245161228/user-feedback/review-details?reviewId=cc24f051-6a85-42b6-a575-c1a2479d199d&corpus=PUBLIC_REVIEWS

TWiStErRob commented 1 year ago

@razumeiko can you please help me a bit? I just double-checked this and Cyrillic character search works just fine since the first version (this screenshot is from my original first published version): image

although it doesn't handle upper-case (П) and lower-case (п) characters equal ![image](https://github.com/TWiStErRob/net.twisterrob.inventory/assets/2906988/df34cee0-19b1-4a37-9978-c4cb712f1d16)

Can you please send me some item names, and search queries that you would expect to work differently?

razumeiko commented 1 year ago

Hey, @TWiStErRob , here is what I found. So here is the room I created with 5 items, I tried different names with spaces and same text, but some of them starts from the uppercase letters and some of them not, both Cyrillic(Russian/Ukranian) and Latin(English). image

Interesting, the search works but only if the word is lowercase. You can see if I have item with three words with part "тест", one upper and all other lower cases, search will find this lower cased but will not find same word with upper image

I tested the same experiment with English words, and they works as expected, here search is case-insensitive. image

Also it does not matter if you are trying to search exactly as it is with uppercase, if the word has uppercase it will not go to the search no matter how you write it in the search box.

TWiStErRob commented 1 year ago

Thanks @razumeiko! I see why people say it's "bad". Search only works with non-latin if the whole inventory has lower-case only name, but the "Item Name" field automatically starts with an upper-case letter, so people won't do this, unless they intentionally want to.

The search engine I used only supported ASCII (latin) case-insensitive search (old Android). The new one is Unicode, so it knows how to map all scripts.

Oddly when searching the search query was lower-cased correctly, I probably do that manually somewhere in the code, I'll have to remove that to make it consistent.

I got the fix for this (just using a better search engine); left: bad, right: fixed.

image

razumeiko commented 1 year ago

Nice! Waiting for this update. Thanks for your hard work!