Closed iipavlov closed 7 years ago
Thanks for the PR, the second commit looks good! As for the first, it looks like you're right - I can't find any romanization/transliteration guidelines that would convert them to the latin b. :)
However, should ъ/ь actually default to "/' (quotes and apostrophes being the transliteration of prime ′ & double prime ″) or empty strings? https://en.wikipedia.org/wiki/Romanization_of_Russian
@danielstjules, The ъ and ь are probably the most confusing letters in Cyrillic - they don't exist in all alphabets and where they exist they have different sounds (or rather influence on the surrounding sounds) https://en.wikipedia.org/wiki/Cyrillic_script Anyway, my reason to not represent them as quotes and apostrophes (as proposed in the Russian transliteration) was that I wanted to keep alphanumeric representation, so the romanization could be used in URLs - a common scenario for SEO.
Anyway, my reason to not represent them as quotes and apostrophes (as proposed in the Russian transliteration) was that I wanted to keep alphanumeric representation, so the romanization could be used in URLs - a common scenario for SEO.
Makes sense! But toAscii really only performs transliteration/romanization to the ASCII range, which would include apostrophes. For use with URLs, I'd recommend slugify which would strip those special chars
Actually ъ is rather a big deal in Bulgarian - it exists in the name of the country and language itself - България, български and there are words like ъгъл (corner) which would look very strange if ъ is replaced with empty string or apostrophe. As much as I know this letter is very rear (or nonexistent) in the other languages using Cyrillic, so the possible risks of misrepresentation are less there.
But the transliteration for Bulgarian would be correct when supplying ->toAscii('bg')
? I'm only suggesting we change the default, if that makes sense.
However, that does bring up the point that slugify is missing a $language
param!
However, that does bring up the point that slugify is missing a $language param!
As much as I know this letter is very rear (or nonexistent) in the other languages using Cyrillic, so the possible risks of misrepresentation are less there.
This is why I proposed in the default map. It would fix it backward. Languages like Serbian and Macedonian would never notice it - as it is not used in them, in Russian the words with it are very rear, and the y as replacement would make sense also there as it is a no-sound letter also. Or they could use the language param, later.
Seems reasonable to me, thanks again! :)
In Bulgarian the cyrillic
ъ
is a vowel pronounced as incut
.ь
appears only in combination:ьо
pronounced as inyo-yo
. Both are available only in some alphabet subsets, but I don't think that anywhere they are pronounced as the Latinb
. Thebg
specific transliteration is according https://en.wikipedia.org/wiki/Romanization_of_Bulgarian