OSMLatvija / Osmalyzer

Parsing OSM data in Latvia against various data sources
https://osmlatvija.github.io/Osmalyzer/
GNU General Public License v3.0
2 stars 1 forks source link

Translit ru exceptions #47

Closed markalex2209 closed 3 weeks ago

markalex2209 commented 3 weeks ago

A bunch of minor changes to ru transliteration checker:

Additionally, lowered "good enough" threshold to 0.5: I'm working my way through transliterations, and I hope to address most of them that actually differ in at least a letter.

markalex2209 commented 3 weeks ago

Let's wait a bit. I'm also checking combinations of consonant + j + vowel, like mju. I think they all need to be transliterated with soft sign between consonant and vowel, but I need to check.

markalex2209 commented 3 weeks ago

Checked, confirmed. Happy with results. Ready for review

HellMapGoesCoding commented 3 weeks ago

Just as a reflection. I originally wanted to add an existing package for transliterations and I tried a couple. But I immediately realized that none of them handle Latvian language and all those š and ž letters with diacritics don't get transliterated at all. So I started making my own. But when I got to the point having to think about consonants/vowels and combinations, I just stopped caring... So that's why I also added the approximate comparison.

markalex2209 commented 3 weeks ago

I get you. And I think not using library here is wise, since without it we are able to tweak process as we like (For example, combination of jo might require attention in future). And I got that since you are not native speaker, this might be a bit too much. I'm glad to help were I can.

Regarding word distance - this is a correct call, because manually comparing strings will be a lot of useless headache. And I personally don't care if the most appropriate transliteration would be Саиета but now it is Сайета (or vice versa), etc.

So, I'd say your choices played out very well here.