Incorrectly parsing UTF-8 words

Hello,

We've had an issue with the parser, it was working correctly for most languages, but we've noticed that it incorrectly parses words in Czech. For example, we had a term CI and the parser was parsing the word zákazníci like zákazníci.

Screenshot 2024-10-22 at 16 27 55

For anyone who happens to have this issue, we've come to a solution to give the regex an additional flag /u to the regex pattern in ParserService. Here is a patch for this:

` diff --git a/Classes/Service/ParserService.php b/Classes/Service/ParserService.php --- a/Classes/Service/ParserService.php (revision 29da54f7496840a303af01d0f4b9cb2c84fa6e75) +++ b/Classes/Service/ParserService.php (date 1729607337701) @@ -580,7 +580,7 @@ '($|[\s<[:punct:]]|<br>' . self::$additionalRegexWrapCharacters . ')' . '(?![^<]>|[^<>]*</)' . '#' .

($term->isCaseSensitive() ? '' : 'i');

($term->isCaseSensitive() ? '' : 'i') . 'u';

 // replace callback
 $callback = function (array $match) use (

Thanks for this extension :)

featdd / dpn_glossary

Incorrectly parsing UTF-8 words #223