OSMLatvija / Osmalyzer

Parsing OSM data in Latvia against various data sources
https://osmlatvija.github.io/Osmalyzer/
GNU General Public License v3.0
2 stars 1 forks source link

Check transliteration for English #46

Closed markalex2209 closed 3 weeks ago

markalex2209 commented 3 weeks ago

Extracted names of nomenclature items into a separate tsv data file. Added multiple translations (like iela -> улица or ул.) Added check for English transliteration. Added list of encountered locales, that were not checked.

Might need addressing:

HellMapGoesCoding commented 3 weeks ago

I think it might be useful to separate them, but I can fully work out how to do it properly.

I think that's indeed best. I split them.

HellMapGoesCoding commented 3 weeks ago
  • Check for English uses same approach with word distance. For now all letters treated as different. Added weights to allow switches between garumzime and not, but not sure how valid that is, so commented that out for now.

Hmm, yeah, I don't really know. The whole word distance was needed because transliteration is just too approximate. But this isn't needed for simpler cases like English, because the name stays in Latin and technically should be exactly the same (at least going by VVC advise). At least, the number of false positives is much lower.