featdd / dpn_glossary

Glossary extension for TYPO3
http://typo3.org/extensions/repository/view/dpn_glossary
GNU General Public License v2.0
20 stars 31 forks source link

Incorrectly parsing UTF-8 words #223

Open michalpodrouzek opened 3 weeks ago

michalpodrouzek commented 3 weeks ago

Hello,

We've had an issue with the parser, it was working correctly for most languages, but we've noticed that it incorrectly parses words in Czech. For example, we had a term CI and the parser was parsing the word zákazníci like zákazníci.

Screenshot 2024-10-22 at 16 27 55

For anyone who happens to have this issue, we've come to a solution to give the regex an additional flag /u to the regex pattern in ParserService. Here is a patch for this:

` diff --git a/Classes/Service/ParserService.php b/Classes/Service/ParserService.php --- a/Classes/Service/ParserService.php (revision 29da54f7496840a303af01d0f4b9cb2c84fa6e75) +++ b/Classes/Service/ParserService.php (date 1729607337701) @@ -580,7 +580,7 @@ '($|[\s<[:punct:]]|<br>' . self::$additionalRegexWrapCharacters . ')' . '(?![^<]>|[^<>]*</)' . '#' .

Thanks for this extension :)

featdd commented 8 hours ago

Hi @michalpodrouzek,

I have to check if adding this produces issues on some other places, there were issues with the case of umlauts as well.

Greetings Daniel