ausi / slug-generator

Slug Generator Library for PHP, based on Unicode’s CLDR data
MIT License
800 stars 54 forks source link

Correction in Hindi Phrases #16

Open bazingarj opened 5 years ago

bazingarj commented 5 years ago

मिठाई - mithai (coming up as mitha-i) खुशबू - khushbu ( coming up as khasaba) लेना - lena ( coming up as lana) पैसे - paise (comping up as pasa) अब - aba (must be ab)

ausi commented 5 years ago

The transliteration between scripts, like Devanagari to Latin in this case, is performed by the ICU library which uses the data of the Unicode CLDR.

The Devanagari-Latin transform internally transforms to InterIndic first and afterwards from InterIndic to Latin.

Taking “अब” for example, you can see that “अ” gets transformed to \uE005 in Devanagari-InterIndic.xml:20 and “ब” to \uE02C in Devanagari-InterIndic.xml:59.
The Codepoints \uE005 and \uE02C get assigned to $wa in InterIndic-Latin.xml:21 and $ba in InterIndic-Latin.xml:60.
And finally $wa to “a” in InterIndic-Latin.xml:446 and $ba to “ba” in InterIndic-Latin.xml:298.

In short:

अ -> \uE005 -> $wa -> a
ब -> \uE02C -> $ba -> ba

As I have no knowledge about Devanagari I can’t spot at which point the transformations are wrong.
It would be great, if you can file a ticket directly at the CLDR: http://cldr.unicode.org/index/bug-reports

You can reproduce the issue with a single line of PHP code:

echo \Transliterator::create('Deva-Latn')->transliterate('अब');