Open alexeyev opened 5 months ago
Hi Anton!
There are a couple ways to do this. All involve compiling apertium-kir, and then running make
in dev/ortho
. Then:
$ echo "кыргыз" | hfst-lookup cyr-ara.hfst
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups
> кыргыз قىرعىز 6.000000
Or
$ hfst-fst2fst -Oo kir@Cyrl-kir@Arab.hfst cyr-ara.hfst
$ echo "кыргыз тили" | hfst-proc kir@Cyrl-kir@Arab.hfst
^кыргыз/قىرعىز$ ^тили/تئلئ/تئلى/تىلئ/تىلى$
I can work on a slightly more user-friendly approach using an apertium
mode (as I believe apertium-kaz
has). Also note that it's currently setup for accepting Perso-Arabic script, not generating it accurately, so some additional fine-tuning of the mapping rules to Cyrillic would be needed if this is the use you plan for it. Let me know if that'd be helpful (and also feel free to contribute yourself).
Dear @jonorthwash, thank you for such a swift response!
I've read a paper on AgglutiFit
and I've realized that an open source tool allowing the conversion from Perso-Arabic script into Cyrillic should be a useful instrument both for those interested in Kyrgyz language in general and for NLP research purposes as well. I've found some online tools (web services) only, tried implementing something myself, and only then realized that the approach from your paper must be a perfect fit.
However, so far my curiosity is not rooted in any particular research project, therefore there is no rush (at least for me) for making the transliterator even more user-friendly.
I'll try out the scripts and instructions that you have kindly provided ASAP and will get back to you if that is ok.
Dear colleagues,
thank you for your work! Judging by the paper Multi-script morphological transducers and transcribers for seven Turkic languages, this transducer can be used for transliteration, Cyrillic/Arabic scripts.
If that is the case, may I ask you to share some instructions or at least some entry points?
Thank you in advance!
Best regards, Anton.