OSMNames / OSMNames

Data for place names from OpenStreetMap prepared for fulltext search. Downloadable. Ranked. With bbox and hierarchy. Ready for geocoding.
http://osmnames.org/
GNU General Public License v2.0
339 stars 64 forks source link

Incorrect nesting #183

Open misiman opened 5 years ago

misiman commented 5 years ago

Hi,

The new release looks really good.

Checking the data on OSMNames.org, there appears to be some inaccuracies with the nesting of elements.

For example, Ashtead is categorised as being in Dorking, Surrey: https://osmnames.org/#q=ashtead

Whereas actually OSM has it correctly situated in Mole Valley, Surrey: https://nominatim.openstreetmap.org/details.php?place_id=1212322

The local administrative structure shows Dorking as the "Adminstrative Headquarters" of Mole Valley which in turn contains Ashtead: https://en.wikipedia.org/wiki/Mole_Valley

Might this explain the problem?

Thanks

misiman commented 5 years ago

Hi again,

I'm sure the issue is related to my suggestion above, it appears that Mole Valley (administrative) has been imported as Dorking (city):

https://osmnames.org/#q=dorking

This means that Dorking (town) is displayed as: "Dorking, Dorking, Surrey, South East, England, United Kingdom"

Rather than how it should be: "Dorking, Mole Valley, Surrey, South East, England, United Kingdom"

This has the added (confusing) side effect that Dorking railway station is displayed as: "Dorking, Dorking, Dorking, Surrey, South East, England, United Kingdom"

Hope this gives some further insight into the problem.

philippks commented 5 years ago

thanks for your feedback @misiman

the admin_center node is indeed wrongly merged into the relation. I'll fix this soon.

misiman commented 5 years ago

Sure, glad to help testing.

I've found another example so presumably this is not an isolated case:

https://osmnames.org/#q=esher

Returns Esher (city) which has osm_id 109611 and Esher (town), although the record for Esher (city) should be Elmbridge (administrative) which then contains Esher (town), as in:

https://www.openstreetmap.org/relation/109611

Is this because the towns are generated from relations and not nodes, as in:

https://www.openstreetmap.org/relation/9274285 https://www.openstreetmap.org/node/2143774725

Additionally, the node record for Esher (town) also provides wikipedia and wikidata entries, which are presumably useful, whereas the relation does not.

Please let me know if you need some more examples,

philippks commented 5 years ago

it's actually a pretty fundumental bug in the logic where the linked nodes are merged into the relations... I fixed it locally, will test it and release a new version as soon as possible.

thank you very much for the testing and sorry for the inconveniences :pensive:

misiman commented 5 years ago

No problem @philippks

Looking forward to the new release.

Best

misiman commented 5 years ago

Hi @philippks

Wasn't sure if you wanted me to post follow-up findings, but I've found an anomaly (in v2.1.1) which seems to be (slightly) related to this thread.

In the OSMNames dataset, Untertürkheim has the following properties:

osm_type: relation osm_id: 1107870 display_name: Untertürkheim, Untertürkheim, Untertürkheim, Stuttgart, Regierungsbezirk Stuttgart, Baden-Württemberg, Germany wikidata: Q897283 wikipedia: en:Stuttgart

This seems partially correct, although the display_name is possibly an aggregate of the relation and node and the wikipedia is incorrect, it should be de:Untertürkheim

There is a wikipedia redirect from the English language version of Untertürkheim to Stuttgart, so this might go some way towards explaining the erroneous data?

I'm happy to help testing offline before releases.

Thanks

philippks commented 5 years ago

again, thank you very much for your feedback!

you're right, the wrong wikipedia tag is because of the redirection of the english wikipedia article. Therefore, the logic to consider wikipedia redirects must be improved slightly. I'll create an issue for it.

The aggregation of the names is resulting because of these three relations:

I don't know whats the reason for the three relations (all of them with different admin_levels)... what would be the best way to handle such cases? merge them into one relation? Or simply merge equal names when exporting?

However, I think both issues aren't related with this issue and we should create new ones...

Btw. were you able to verify initial Dorking-issue on release 2.1.1?

misiman commented 5 years ago

Yes, Dorking appears perfect now.

Looking at Google Maps (GeoBasis-DE / BKG 2009), the lowest admin level geo-boundary appears to the the correct version: https://www.google.com/maps/place/Untertürkheim,+Stuttgart,+Germany/

This correlates with OSM relation 1107836: https://www.openstreetmap.org/relation/1107836 which shows Untertürkheim is part of relation Stuttgart II (2997215) https://www.openstreetmap.org/relation/2997215

So, one could assume that if there are multiple relations and only one OSM entry contains "part of" metadata, then this would be the best candidate for selection, especially if it has the lowest admin_level which also indicates a higher position in the hierarchy.

Leading back to why I updated this particular thread, incorporating this (or a similar) selection method should also resolve the incorrect display_name: Untertürkheim, Untertürkheim, Untertürkheim, Stuttgart, Regierungsbezirk Stuttgart, Baden-Württemberg, Germany

which incorrectly nests Untertürkheim multiple times obviously because of the multiple geo-boundaries and admin_levels.