Closed markalex2209 closed 3 weeks ago
I think it might be useful to separate them, but I can fully work out how to do it properly.
I think that's indeed best. I split them.
- Check for English uses same approach with word distance. For now all letters treated as different. Added weights to allow switches between garumzime and not, but not sure how valid that is, so commented that out for now.
Hmm, yeah, I don't really know. The whole word distance was needed because transliteration is just too approximate. But this isn't needed for simpler cases like English, because the name stays in Latin and technically should be exactly the same (at least going by VVC advise). At least, the number of false positives is much lower.
Extracted names of nomenclature items into a separate tsv data file. Added multiple translations (like
iela
->улица
orул.
) Added check for English transliteration. Added list of encountered locales, that were not checked.Might need addressing: