Closed rinigus closed 7 years ago
Hi. Just a quick question. Is village defined as administrative region in OSM data, or it is just a single point?
Assigning an address and a location to a region only works, if there is an area-like object found with the region name. An area-like object can either be a administrative boundary or a place type (by marking it as ADMIN_REGION):
In case of villages: TYPE place_village = NODE AREA ("place"=="village") {Name, NameAlt} ADMIN_REGION
In this case https://www.openstreetmap.org/relation/352071 should be the relation that defines the administrative boundary for Kaberneeme küla.
The location index generation will build a tree from the administrative boundaries and places and then will look the "deepest region in the tree" for the given geo coord location.
I would have to make a closer look why in this case this did not work. It is possible that the two outer region the boundary at least obviously has, result in a problem. The first task is, to see, if the region was correctly detected and inserted into the tree. The next step would be, to see why the address was not sorted into this region.
[tim:~/projects/OSMScout/Demos] master(+51/-0,247)+* 127 ± src/LocationLookup ../maps/estonia "Kaberneeme küla" = Kaberneeme küla
Looks correct to me (though there might still be the two outer problem)
Its defined as a region. Otherwise it would be unserstandable. Notice that OSM search is able to resolve and find the address correctly
The village seems to be assigned to the area, but the streets of the village are not assigned to the village, for some reason...
I still assume that the multiple outer areas might be the problem. I'll analyse further, as soon as I have some time.
Thank you!
When looking closer, it looks that Jõelahtme vald is missing for the village (compare with OSM link result). So, maybe these borders do make the library confused and the village is pushed into a level above (ssame as vald).
It looks like the region tree is already "broken" in this special case (and most likely also in mor egeneral if a region has multiple areas).
If have not yet done actual debugging but looking at the region tree the hierachy is:
Kaberneeme küla/Harju maakond/Eesti
Reason for this is likely, that for the both areas of Kaberneeme küla one common boundingBox is calculated. And it looks like this boundingBox is more essential for building the tree. And since the common boundingbox of the sum of both regions is big, it might be that is sorted in higher in the hierachy than it should. Everything else is likely just a consequence...
Easy solution would be to insert the region once for each area (instead of one region with a list of areas). But I'm unsure what the consequences would be. Need some (continuous) time to think about it and test it (sound like "weekend"). However t should be easy to try out, if you are interested....
It should be easy to see what actually happens by dumping some debug output (region name and comparison result) into
LocationIndexGenerator::AddRegion()
After thinking about it, IMHO assigning a region with a different area multiple times is wrong. From the mapping view assigning a location/address based on containment is wrong. because of corner cases a location/address should always be assigned to the region explicitly named, even if there is no containment. I assume we will fix this in future. If we now have regions multiple times, we will never be sure which of the region we have to assign a location/address.
I will now refactor the code so that every area has its own bounding box instead of the region itself. I hope, this will work out and will work better.
I must say, you lost me here. Let me try to ask the questions that could make it clearer. I don't see where do we "assign a region with a different area multiple times". Would you mind to bring out an example?
As for explicitly named regions: do you mean use of is_in tag in OSM? As far as I could see, its just a name that does not even have to match anything (not some OSM object/relation ID).
What's also confusing for me is that you relate to AREA and REGION as a different entities. I guess, I always assumed they are the same.
We currently do not. Currently a region can consist of multiple areas. All areas are assigned to one region object and this region object has one parent.
To fix the problem I was thinking about, changing this to having one area per region but instantiate the region multiple times with (possibly) different parents.
Yes, I mean the is_in tag. Yes, I know that in the past it was of dubious quality but IMHO it should always be checked first. There are corner cases, where an address is not in a given area but still belong to that region. This especially true for object along the border of regions. The placement of the entry can result in such things like "belongs to region B but because the entry leaves to street "a" and street "a" belong to region A it is assigned to region A". We currently do not check "in_in" at all.
A administrative boundary is normally a area or relation. A relation can have multiple outer rings. This is the case where the city has not one area but somehow "island". Such relation could also have inner rings, with then are cutouts and mark sub areas that do not belong to the region. We currently do not handle such cutouts either.
OK, I see. Thank you for explanation, it helped together with looking into importer. Maybe there is no need to do massive changes. As it is now - regions as collection of areas - we have correspondence between admin division and its representation (as you brought out the islands...). So, maybe this should stay and we would have to make algorithm of checking a bit more robust, i.e tolerate floating point imperfections as well as the data imperfections.
As a start, using is_in would be great. That you could immediately relate to your Region class and let's see if that would help a lot already. I was going to suggest is_in as well and maybe it should be considered as a first check with the current checks (and their improvement) done only if is_in fails.
I did some changes, that at least change the result. Can you please take a look if things are better now?
Thank you! Sure, I'll start testing right now
Unfortunately, the case is still not resolved: Kaberneeme küla is still not associated with Jõelähtme vald, but is on an administrative level above. Maybe the problem is in a small island visible at http://www.openstreetmap.org/relation/352071 ? It still belongs to Jõelähtme vald http://www.openstreetmap.org/relation/350208#map=13/59.5117/25.3149 , but it is a bit out of the way...
BTW, is_in is correctly pointing to the admin relationships, as expected.
I will take a further look but fear that this can only be solved by evaluating is_in and also addr:city on the address.
I think it makes sense, to split this into two futher issues (is_in, and addr:city). We can keep this issue open nevertheless.
Meanwhile, I'll try to catch where is the split failing. That way maybe we can approach the problem from different angles.
I traced the problem to https://github.com/Framstag/libosmscout/blob/master/libosmscout-import/src/osmscout/import/GenLocationIndex.cpp#L472 (call IsAreaSubOfAreaQuorum), as should have been expected. I'll work on it further and see if a different implementation of area check would work better.
Tim, can you tell whether each area represented as vectors is expected to be a proper closed area? As you mentioned earlier, we don't even cover yet regions with holes in them (like hole corresponding to another region), right? So, we could assume that the region does not have holes in the check.
For region that are represented by one area a child region should be 100% enclosed by its parent region. IT would be strange if a suburb is somehow bigger than the enclosing city. Sadly this precision is not always in the current OSM data. I had examples where there was slight overlapping. Thus the Quorum method was introduced, that allows some variance.
Regarding "being closed". Yes, the list of nodes that represent the area should be closed it that the last node and the first node close the given area. It is an area, not a way. It should also be normally shaped so e.g. the border should not intersect itself. However this is OSM, mapping error are always possible. So I would at lest expect that e.g. hourglass shapes could happen (instead of mapping this as two disjunct areas).
The problem we have likely occurs if a region consists of multiple areas. In this cases situation is not clear anymore. And the right containment may even depend on the order of objects during tree building.
Holes are not that much of an problem if for the hole there (likely) also exists a different boundary that defines this sub region matching the hole. Since we always assign the node to the leaf of the tree we would still assign the address to this sub region.
My pushes changes do mainly the following:
<Cell,list<RegionRef>>
. It allows to directly access region candidates for a certain area instead of always walk the region tree from root to leaf and do containment checks. I changed the code to first use the bounding boxes of the individual areas and not the union bounding box of the region for creating the index. I also sorted the list of regions by area size (it would be even better to sort it by level). This way, I check first for the leaf region in the tree for containment and then walk up the tree. This resulted in assigning the address to the right region. Nevertheless the region tree is still wrong as you detected.IMHO these are all algorithmic fixes that will improve the situation. Major improvements will however likely come only by using containment only as fall back and using addr:city and is_in instead for a first try. This is not difficult in itself. But some amount of code has the get written and tested and integration into the existing code, that is already rather complex. Perhaps doing some refactoring of the existing code (more OO mechnism, better naming, overhault of data structures) should possibly done first.
Hope this helps.
I just submitted PR fixing this case.
One general question: should we replace all coordinate comparisons with some inequality allowing the values to differ within some tolerance? I think that could make a difference in some cases.
I guess we can close this issue after merging https://github.com/Framstag/libosmscout/pull/237
When studying import of Estonia closer while working on geocoder, I observed that several villages are stripped from their streets and streets are assigned to the corresponding county. Let's take as an example
http://www.openstreetmap.org/search?query=haava%20tee%204%2C%20Kaberneeme%20k%C3%BCla#map=19/59.51832/25.27578
In libosmscout import, the street "Haava tee" is under "Jõelähtme vald" (as observed in location_full.txt and in generated SQLite database). As a result, we cannot search for "Haava tee, Kaberneeme küla".
I haven't looked into why the village is stripped of the streets. Note that this is not isolated case and are quite a few other villages like that.
Maybe you could give some tips or know the reason? Map was imported using estonia-latest.osm.pbf from http://download.geofabrik.de/europe/estonia.html