barseghyanartur / transliterate

Bi-directional transliterator for Python. Transliterates (unicode) strings according to the rules specified in the language packs.
https://pypi.python.org/pypi/transliterate
297 stars 50 forks source link

Missing characters in Latin -> Cyrillic transliteration #14

Open skoval00 opened 8 years ago

skoval00 commented 8 years ago

Characters C, Q, W, X are missing in transliteration tables for Latin -> Cyrillic transliteration. $ python -c 'import transliterate; print transliterate.translit("CQWX", "ru")' CQWX

barseghyanartur commented 6 years ago

@skoval00, @SomeUser55:

The main question is, what to do with them?

https://en.wikipedia.org/wiki/Romanization_of_Russian#Transliteration_table

Any other/better ideas?

skoval00 commented 6 years ago

In my previous company some guys made a complex tool for multi-language transliteration of toponyms. As far as I understood they had sets of rules with different probabilities which depended on the location of characters' sequences in a word (like ck -> к, ough -> о). The process of transliteration had a few stages implemented in different programming languages, but result was quite good. Unfortunately it was rather ugly and hardly separable from the main codebase.

hadaev8 commented 4 years ago

Same here: text = 'Когда Digital Equipment Corporation сократила количество рабочих мест на три тысячи, в ее официальном объявлении говорилось не об увольнениях, а о «вынужденных мерах».'

->

Когда Дигитал Еqуипмент Цорпоратион сократила количество рабочих мест на три тысячи, в ее официальном объявлении говорилось не об увольнениях, а о «вынужденных мерах».