Open skoval00 opened 8 years ago
@skoval00, @SomeUser55:
The main question is, what to do with them?
https://en.wikipedia.org/wiki/Romanization_of_Russian#Transliteration_table
strict
argument strip out all characters that are not listed in the language pack. translit
function (which would then take precedence over chosen language pack).Any other/better ideas?
In my previous company some guys made a complex tool for multi-language transliteration of toponyms. As far as I understood they had sets of rules with different probabilities which depended on the location of characters' sequences in a word (like ck -> к, ough -> о). The process of transliteration had a few stages implemented in different programming languages, but result was quite good. Unfortunately it was rather ugly and hardly separable from the main codebase.
Same here: text = 'Когда Digital Equipment Corporation сократила количество рабочих мест на три тысячи, в ее официальном объявлении говорилось не об увольнениях, а о «вынужденных мерах».'
->
Когда Дигитал Еqуипмент Цорпоратион сократила количество рабочих мест на три тысячи, в ее официальном объявлении говорилось не об увольнениях, а о «вынужденных мерах».
Characters C, Q, W, X are missing in transliteration tables for Latin -> Cyrillic transliteration.
$ python -c 'import transliterate; print transliterate.translit("CQWX", "ru")'
CQWX