FreshRSS / FreshRSS

A free, self-hostable news aggregator…
https://freshrss.org
GNU Affero General Public License v3.0
9.36k stars 806 forks source link

SQLite case insensitive search for non-ASCII Unicode characters #4915

Open squromiv opened 1 year ago

squromiv commented 1 year ago

Describe the bug Feed entry title or text contains a word with capital letter starting.

1

Try to search with lowercase. Nothing found. 2

And exactly spelling 3

There is no such a problem with latin letters 4

Environment information (please complete the following information):

Alkarex commented 1 year ago

What database are you using?

squromiv commented 1 year ago

@Alkarex Sorry not to mention. It`s sqlite.

math-GH commented 1 year ago

@squromiv Could you please support us with a link to this RSS feed?

Alkarex commented 1 year ago

Documentation : https://www.sqlite.org/lang_expr.html

SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE. The ICU extension to SQLite includes an enhanced version of the LIKE operator that does case folding across all unicode characters.

squromiv commented 1 year ago

@squromiv Could you please support us with a link to this RSS feed?

https://www.standartnews.com/rss

squromiv commented 1 year ago

ICU extension to SQLite includes an enhanced version of the LIKE operator that does case folding across all unicode characters.

It looks like I`ve activated this ext. And search is still not working properly, screen_2022 12 06 001

Alkarex commented 1 year ago

https://www.sqlite.org/datatype3.html#collation

NOCASE - Similar to binary, except that it uses sqlite3_strnicmp() for the comparison. Hence the 26 upper case characters of ASCII are folded to their lower case equivalents before the comparison is performed. Note that only ASCII characters are case folded. SQLite does not attempt to do full UTF case folding due to the size of the tables required. Also note that any U+0000 characters in the string are considered string terminators for comparison purposes.

https://www.sqlite.org/faq.html#q18

The default configuration of SQLite only supports case-insensitive comparisons of ASCII characters. The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability.

Instead of providing full Unicode case support by default, SQLite provides the ability to link against external Unicode comparison and conversion routines. The application can overload the built-in NOCASE collating sequence (using sqlite3_create_collation()) and the built-in like(), upper(), and lower() functions (using sqlite3_create_function()). The SQLite source code includes an "ICU" extension that does these overloads. Or, developers can write their own overloads based on their own Unicode-aware comparison routines already contained within their project.

Inspiration https://dev.to/shallowdepth/5-ways-to-implement-case-insensitive-search-in-sqlite-with-full-unicode-support-53p2

I do not think we will address that immediately with SQLite. So to support this usecase, I suggest you @squromiv try to use PostgreSQL, MariaDB, or MySQL as a database, which we all support in FreshRSS.

squromiv commented 1 year ago

I suggest you @squromiv try to use PostgreSQL, MariaDB, or MySQL as a database

No chance. SQLite is the best for me in purpose to archive, move and backup. So will be waiting for other solution, if it is possible.

Alkarex commented 1 year ago

We support export/import from any database to SQLite, which is also our recommended approach for archiving, moving, and backup.

See ./cli/export-sqlite-for-user.php and ./cli/import-sqlite-for-user.php https://github.com/FreshRSS/FreshRSS/blob/edge/cli/README.md

Example (works also with e.g. Docker):

cd ./FreshRSS/
./cli/export-sqlite-for-user.php --user alice --filename /tmp/alice.sqlite