backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

Transliteration for Bulgarian languadge #1548

Open amilenkov opened 8 years ago

amilenkov commented 8 years ago

There is an error in a transliteration, and it incorrectly transliterates a letter from the Bulgarian language. This error is present both in the old separate module "transliteration" and the new Backdrop version 1.3.0 where transliteration is included in the core.

When the system transliterates Bulgarian letter "Ъ" (Unicode U+042A) or "ъ" (U+044A) the system don't transliterates it but drops the letter from the resulting url.

"ъ" is specific Bulgarian letter - it is used only in Bulgaria. It is phonetically similar to "a" and usually is transliterated by "a". But in this case it is deleted, not transliterated.

I (and many Bulgarians) prefer to transliterate it as "y".

I add to the file x04.php in the folder \includes\transliteration the following code with preferred by me way of transliteration:

// Overrides for Bulgarian input. $variant['bg'] = array( 0x10 => 'A', 0x11 => 'B', 0x12 => 'V', 0x13 => 'G', 0x14 => 'D', 0x15 => 'E', 0x16 => 'Zh', 0x17 => 'Z', 0x18 => 'I', 0x19 => 'J', 0x1A => 'K', 0x1B => 'L', 0x1C => 'M', 0x1D => 'N', 0x1E => 'O', 0x1F => 'P', 0x20 => 'R', 0x21 => 'S', 0x22 => 'T', 0x23 => 'U', 0x24 => 'F', 0x25 => 'X', 0x26 => 'C', 0x27 => 'Ch', 0x28 => 'Sh', 0x29 => 'Sht', 0x2A => 'Y', 0x2B => 'J', 0x2C => 'J', 0x2E => 'Ju', 0x2F => 'Ja', 0x30 => 'a', 0x31 => 'b', 0x32 => 'v', 0x33 => 'g', 0x34 => 'd', 0x35 => 'e', 0x36 => 'zh', 0x37 => 'z', 0x38 => 'i', 0x39 => 'j', 0x3A => 'k', 0x3B => 'l', 0x3C => 'm', 0x3D => 'n', 0x3E => 'o', 0x3F => 'p', 0x40 => 'r', 0x41 => 's', 0x42 => 't', 0x43 => 'u', 0x44 => 'f', 0x45 => 'x', 0x46 => 'c', 0x47 => 'ch', 0x48 => 'sh', 0x49 => 'sht', 0x4A => 'y', 0x4B => 'j', 0x4C => 'j', 0x4D => 'e', 0x4E => 'ju', 0x4F => 'ja', );

But Backdrop does not see this code and transliterates in the old way. The system not only does not see rule for "ъ / Ъ" ( 0x4A => 'y' / 0x2A => 'Y') and deletes letters, but it transliterates other Bulgarian letters in default way - not in a way I've defined.

The above code works with no problems in Drupal 7 and is in accordance with recommendations «Language specific replacements» in

https://backdropcms.org/project/transliteration

But why Backdrop doesn't want to see the code I've added?

amilenkov commented 8 years ago

In my previous post I offered a

Bulgarian transliteration, based on common practice in Bulgaria and it was integrated in the Backdrop. Thanks!

But then I learned that transliteration that Google does when it processes the pages of Bulgarian language works the other way.

Attached I send a new table for transliteration, which allows Google to properly understand the meaning of words transliterated from Bulgarian with English letters.

bg.php.zip