fneum / core-tso-data

MIT License
7 stars 8 forks source link

Diacritics in locator results #14

Open fleimgruber opened 1 year ago

fleimgruber commented 1 year ago

In https://github.com/fneum/core-tso-data/blob/main/outputs/locator-results.csv it seems that the diacritic "ue" was wrongly replaced by "ü", e.g. "Taürn AT" vs. "Tauern AT" and "Neünhagen DE" vs. "Neuenhagen DE". I understand that the automatic matching is fuzzy and all, but maybe this specific diacritics conversion could be fixed? I think this is also the reason why the GIS coordinates are not present for these nodes.

fneum commented 1 year ago

Yes, that bothered me as well, but I didn't fix it yet.

fleimgruber commented 1 year ago

What is the reason for https://github.com/fneum/core-tso-data/blob/main/scripts/process_data.py#L175? Is it because previous transformations resolved diacritics and they are regenerated here? If so then we could easily extend the list of false positives (as you did with "Itzehoe" etc.) and I could contribute that.

fneum commented 1 year ago

Yes, that would be a good solution. The original reason for the transformation was that the geolocator had difficulties resolving e.g. Koeln.