We've had an issue with the parser, it was working correctly for most languages, but we've noticed that it incorrectly parses words in Czech. For example, we had a term CI and the parser was parsing the word zákazníci like zákazníci.
For anyone who happens to have this issue, we've come to a solution to give the regex an additional flag /u to the regex pattern in ParserService.
Here is a patch for this:
Hello,
We've had an issue with the parser, it was working correctly for most languages, but we've noticed that it incorrectly parses words in Czech. For example, we had a term CI and the parser was parsing the word zákazníci like zákazníci.
For anyone who happens to have this issue, we've come to a solution to give the regex an additional flag /u to the regex pattern in ParserService. Here is a patch for this:
` diff --git a/Classes/Service/ParserService.php b/Classes/Service/ParserService.php --- a/Classes/Service/ParserService.php (revision 29da54f7496840a303af01d0f4b9cb2c84fa6e75) +++ b/Classes/Service/ParserService.php (date 1729607337701) @@ -580,7 +580,7 @@ '($|[\s<[:punct:]]|<br>' . self::$additionalRegexWrapCharacters . ')' . '(?![^<]>|[^<>]*</)' . '#' .
($term->isCaseSensitive() ? '' : 'i') . 'u';
`
Thanks for this extension :)