doctrine / inflector

Doctrine Inflector is a small library that can perform string manipulations with regard to uppercase/lowercase and singular/plural forms of words.
https://www.doctrine-project.org/projects/inflector.html
MIT License
11.26k stars 137 forks source link

Accents in non-latin symbols. #217

Open k0ka opened 1 year ago

k0ka commented 1 year ago

Hello.

I'm trying to use Inflector::unaccent to compare strings the same way as mysql with utf8_unicode_ci(_ai that can be omitted) collation compare.

I found out that some (or all) cyrillic letters are not unaccented. For example, russian ё (https://www.compart.com/en/unicode/U+0451) and ukrainian ї(https://www.compart.com/en/unicode/U+0457) which are widely used. The mysql compares them properly and removes accents.

How did your compose the Inflector::ACCENTED_CHARACTERS? Can we add non-latin symbols there?

I guess this can be done automatically using official unicode data: https://unicode.org/Public/UNIDATA/UnicodeData.txt The 6th column shows what is the character composed of (or it's empty if the character is not accented).