Open sbulen opened 1 month ago
I can no longer reproduce this with 3.0. I believe @Sesquipedalian fixed the issue in 3.0 with #8298 .
In fact, I think #8298 fixed my broader concern above that we weren't properly breaking on words. E.g., 3.0 now properly recognizes that 委員會 = "committee", and places a single entry into log_search_words for that portion of the test string above.
Very cool.
Issue still exists with 2.1.
Basic Information
The problem here is hard to see: long words with multi-byte characters don't make it into log_search_words, they are dropped.
Lots of subtleties here, but the core issue is a non-mb-safe substring is taken.
The sequence of events:
Note, if a text2words is called during a background task, an error is logged: Cron error: 8192: strlen(): Passing null to parameter # 1 ($string) of type string is deprecated (load.php, line 182)
This error is suppressed in the app, as deprecation errors are still suppressed in index.php. But not in cron.php.
Similar (but different) report: https://github.com/SimpleMachines/SMF/issues/6405
Bigger issue? The above term isn't actually a word, it's a sentence...
This issue exists both in 2.1 & 3.0. Even when cutting over to UTF8MB4 in 3.0, it may still exist, depending on whether/how the smf truncate function is rewritten.
Steps to reproduce
Expected result
A word in log_search_words
Actual result
No words in log_search_words
Version/Git revision
3.0 alpha 2 & 2.1.4
Database Engine
All
Database Version
8.4
PHP Version
8.3.8
Logs
No response
Additional Information
No response