Esri / arcgis-osm-editor

ArcGIS Editor for OpenStreetMap is a toolset for GIS users to access and contribute to OpenStreetMap through their Desktop or Server environment.
Apache License 2.0
395 stars 129 forks source link

Multipolygon processing errors on large extracts #153

Closed mboeringa closed 7 years ago

mboeringa commented 8 years ago

Hi @ThomasEmge ,

This is the issue I promised to post here regarding some relatively rare errors with multipolygon processing that I see happening on large extracts with the latest build and the OSM File Loader (Load only) tool and that I did not see before the major refactoring related to the OSM File Loader addition in the toolbox.

First have a look at this image. It shows a multipolygon in the Vondelpark in Amsterdam representing a water body with two small, and one larger named island "Koeienweide" (although actually the "Koeienweide" meadow feature is NOT part of the multipolgyon, more on this later on...):

http://www.openstreetmap.org/relation/1141600

koeienweide_plas_relatie_osm

This is a classic "old style" multipolygon with no tags on the relation, and where tag transfer needs to take place to render it properly as a water body.

Now, when I create a small extract just covering the Vondelpark using the Download OSM Data (XAPI) tool of the Editor, and import it using the OSM File Loader tool and subsequently render it using my rendering tool, I get this image, which looks OK. The lake is nicely rendered as multipolygon, and if I look in the file geodatabase, it references nicely the multipolygon relation as mentioned also in the first image, so this seems fine.

koeienweide_plas

Now, however, look at these images. They represent the features related to the multipolygons location (excluding the two small inners), and that I found in the file geodatabase created based on the entire country extract from Geofabrik. You verified the relation as present in the osm XML file, and the multipolygon should thus be created, yet it is not.

First, the water body that forms the outer way of the multipolygon. It should have been removed in case of tag transfer, yet it is not, it is still in the file geodatabase. It has just one tag (except source), which is the water tag that needs transferring to the multipolygon:

w161587161_water

Next up the true inner representing the large island. It has no tags (which of course also should not be transferred to the multipolygon if there had been any, since only tags on the outer count for that process). This feature of course belongs in the file geodatabase, as inners should never be removed during tag transfer:

w161587172_notags

Lastly, an image of the Koeienweide "meadow". Actually, this has no direct relation to the multipolygon, as it is NOT part of it. I just show it here, since it shares most (but not all!) of its nodes with the untagged island inner feature mentioned above. Whether that fact has any relevance in the issue posted here, is questionable, but it is good to be aware of its presence:

w73999357_koeienweide

Anyway, as you understood: why is the multipolygon representing the lake missing in the file geodabase for the large extract, while processing a small extract does not show this issue?

There is no way to create a small test case for this unfortunately, so I'll have to leave it to you to figure out a way to debug this...

mboeringa commented 8 years ago

Hi @ThomasEmge ,

I have now re-downloaded the Geofabrik Netherlands extract and reloaded it. The multipolygon now appeared good in the extract, see the image below the message.

Either two things can have happened:

Anyway, before I give this the full "clear", I am still waiting for my render session to finish. Seeing the symbolized and rendered data always proves very revealing in terms of data errors, and I really want to be sure the outer is also properly dropped and no other issues remain in the imported data.

I would also like you to confirm the presence of the multipolygon relation as well from a full Geofabrik Netherlands download and loading session with the OSM File Loader (Load only) tool. If that comes out OK at your side, I think I can close this issue.

r1141600_water

mboeringa commented 8 years ago

@ThomasEmge ,

This is preliminary, but I do still see issues now the data for the Netherlands is rendered. There are missing multipolygons, and in the particular case shown here of the Vondelpark lake, the relation is there with the correctly transferred "natural=water" tag, but the outer as well. The outer has not been dropped from the dataset, as it should. This causes an unwanted overlap and a false double representation of the feature.

Had my second hard drive crash as well now in the lifetime of this PC... rendering is intensive, and unfortunately, despite a respectable well known drive brand, I only discovered later on these particular drives aren't known for top ranking reliability. Luckily a full backup of this stuff, and this was only the secondary data drive not system, but it is time to get another SSD like the system drive for more rendering options.

ThomasEmge commented 8 years ago

If you can provide me some of the IDs of the relations I can investigate.

Yes, these operations are very input/output intensive.

mboeringa commented 8 years ago

Hi @ThomasEmge ,

You already have these, but just as a reminder:

The above represents an "old style" multipolygon, with only tags on the outer way, not on the relation

The below listed is a "modern style" multipolygon with tags on the relation and not the outer. It is part of the "Het Ij" waterbody north of the Amsterdam city centre, essentially the historic harbour region. It is entirely missing in the Polygon feature class. The outer and inner ways are present: https://www.openstreetmap.org/relation/1942596

This waterbody near it, is missing as well. The relation is not in the resulting feature class. Again, outer and inners are present: https://www.openstreetmap.org/relation/1926875

The world famous "Rijksmuseum" is missing as well...: https://www.openstreetmap.org/relation/6412708

By the way, can you double check if these last three relations are present in the Geofabrik extract using OSMOSIS?

ThomasEmge commented 8 years ago

Marco,

I think we start to mix different items. In your screen shot above you do show the waterbody from Koeienweide. Even with this correct geometry, as expressed in the relation, do you still get the inner and outer items? I have tested the above listed relations (harbor and het ij) individually and they are loaded as expected. So if you were to test the individual file, as attached, the features would load fine but when you attempt to load Netherlands as a country some relation features do not appear.

harbor.zip

hetij.zip

mboeringa commented 8 years ago

I have tested the above listed relations (harbor and het ij) individually and they are loaded as expected. So if you were to test the individual file, as attached, the features would load fine but when you attempt to load Netherlands as a country some relation features do not appear.

Yes, exactly, that is what I am saying. I do see some multipolygons (or even the majority) being processed properly, just not all. Some features drop out when I process the entire Netherlands.

Here is a visual. Notice the Koeienweide gone blue and how, in the entire upper right, the Rijksmuseum building is gone missing: entire_netherlands_koeienweide

The missing harbour. Do note that other multipolygon features like the Scheepvaartmuseum building in the left, have been properly processed: entire_netherlands_harbour

"Het IJ" missing: entire_netherlands_het_ij

mboeringa commented 8 years ago

@ThomasEmge

I now loaded the features individually from the files you send, see the images below. I did make one other observation though: both features seem to suffer from "incorrect-ring-order" problems, that are being fixed in the Repair Geometry step. I wonder if this is related to the issues when loading at country scale?...

From the load session of the "Het IJ": incorrect_ring_order

Correctly loaded harbour as individual feature: individual_harbour

Het IJ also correct: individual_het_ij

mboeringa commented 7 years ago

Hi @ThomasEmge,

This is preliminary, from a first attempt to load the Netherlands dataset again. That session ended with a COM related error again, see the image below. It failed just after the relations processing.

hresult_error

In the steps directly after the relations processing and just before the above error, the tool reports to have successfully deleted the intermediate osm datasets created in the Scratch workspace, yet, if I look in the Scratch workspace, the datasets are still there:

hresult_error_leftovers

Although this session failed, I decided to have a closer look at the data created. As to some mildly good news, I can confirm that I now finally see the missing relations in the Polygon Feature Class. Both the IJ, harbour and Vondelpark lake relation polygon are there, see Het IJ example below and the identify results showing the relation OSMID.

properly_formed_het_ij

However, for the Vondelpark lake example, I still see the outer way as well in the Polygon Feature Class, it hasn't been removed. I don't know if this is a result of the incomplete processing and failure, or still hints at an issue. The water tag did get properly transferred to the lake multipolygon during relations processing, so that part is OK.

Also, I now see another strange thing happening. When I looked at the File Geodatabase, I noticed the intermediate osmCountingTable that wasn't deleted:

osmcountingtable

That by itself is not a big surprise, because the processing failed. However, since I realized the osmSupportingElement nodes were most likely not deleted as well, I just kind of instinctively decided to open the Point Feature Class's attribute table:

node21430322_osm_file_loader

And here the true problem becomes visible. Scrolling through the table, I of course saw the undeleted "supporting" nodes. Not surprising, since the processing never arrived at the stage of deleting them. However, more surprising, I noticed multiple nodes with apparently duplicated names. The most striking was the "Martinuskerk", the name of a Dutch church in the city of Utrecht. It appeared on a dozen nodes or so. That felt uncomfortable. Look at the image above. The vertical arrow pointing upwards highlights the first "supporting" node with the Martinuskerk name attached to it. The dataframe shows its position near a canal. It turns out to be this node:

https://www.openstreetmap.org/node/21430322#map=19/52.08371/5.12271)

part of a road along the canal called "Twijnstraat aan de Werf". As clearly visible in the image below, that node doesn't have any tags associated, so why does the resulting supporting node in the file geodatabase? Of course, it would have been deleted if the option for deleting supporting nodes would have been chosen, but something is wrong here... It reminds me of the issue we had in processing a dataset in Berlin, where also tags appeared on the wrong object. I don't know the issue number right now.

node21430322_osm

ThomasEmge commented 7 years ago

Hmm, the error message is unexpected because after the temp file geodatabases are deleted there is no more new data created. Hence I am surprised that code fails with a CreateFeatureClass call.

The issue you are seeing with the points/nodes is an optimization step. When the user selects the parameter to delete the supporting nodes, the points are loaded with a buffer mechanism. The side effects is that supporting points are getting a copy of the attributes from the last properly attributed node. This is acceptable, as these nodes are deleted at the end. On the other hand it also means if the parameter is not checked the loading of points will take longer as these attributes need to be properly initialized in the geodatabase.

mboeringa commented 7 years ago

Ok, good to hear the last issue is actually a known step in the code. It slightly made me worried as being something new.

That leaves two questions:

Anyway, it seems you are getting closer to finishing this one up, which is good news!

mboeringa commented 7 years ago

Hence I am surprised that code fails with a CreateFeatureClass call.

You actually mean File Geodatabase / File Geodatabase workspacefactory creation, because I don't see a CreateFeatureClass call in the red error message text? Or am I interpreting this wrong (I may well be...)?

ThomasEmge commented 7 years ago

I spoke too soon, it is in the loading step between relations and super-relation where the error occurs. It looks like the delete of the _netherlands-latest_yiy4aqrvvqr0.gdb geodatabase failed, even though the previous tool reports that it succeeded. That is the COM error and it just goes downhill from there. However there is another check failing as well. Any chance you were connected to the temp file geodatabase?

mboeringa commented 7 years ago

Any chance you were connected to the temp file geodatabase?

Not at the time of the exact failure, because I was out of the door. I do inevitably hit some geodatabases that are in some processing stage at times when I am multi-tasking with multiple sessions of ArcGIS, because simply opening the Add Data dialog or so, may take you to a File Geodatabase being used by another ArcMap session.

But I think it unlikely in this case, although admittedly I can not be absolutely sure there was no lock or something.

I do know I have seen this same error multiple times before, at the exact stage you are describing, and I am pretty sure I posted a similar screenshot before.

mboeringa commented 7 years ago

Hi @ThomasEmge ,

I have now attempted to load Poland. This time, the processing passed the relations - super-relations boundary. However, it failed with an unexpected invalid SQL statement that I haven't seen before (or at least can't remember seeing before). Note that it did successfully pass the temporary file deletion stages of the process where the Netherlands extract failed, there was no temp file geodatabase or temp *.osm file left in the Scratch workspace, they were properly deleted when I checked it (as also apparent from the geoprocessing logging in the screenshot). CORRECTION: While the temp file geodatabase was deleted, I still see the x_rn.osm temp files related to the relations processing in the Scratch workspace. They are not deleted!:

sql_error

When I subsequently attempted to add the Feature Classes - without closing the project in which I performed the processing - and attempted to open the referenced Polygon attribute table, I got this error:

sql_error_follow_up_disk_space_error

Clearly a bogus error, and probably result of the inconsistent state in which the failed processing left the project, as I have plenty of disk space left on C-drive:

sql_error_follow_up_disk_space_c

Closing and re-opening the project allowed me to add and view the three Feature Classes created. The screenshot below shows the object as referenced in the SQL statements, to be present in the Polygon Feature Class, the SQL thus should have been valid?... No idea what went wrong here. The object seems to be an outer way of a multipolygon, so should indeed be deleted, as the first error referred to in the code error messages (...IFeature.Delete):

poland_objectid_6986118

mboeringa commented 7 years ago

Running the tool against a small section downloaded using the Download OSM Data (XAPI) tool, succeeds without errors...

ThomasEmge commented 7 years ago

My loading of Poland completed fine without any errors. Let's compare ArcGIS versions really quick: are you running 10.4 or 10.4.1? Are you using geoprocessing foreground or background processing?

mboeringa commented 7 years ago

@ThomasEmge:

10.4.1.5686 on Windows 7, using foreground processing.

mboeringa commented 7 years ago

Hi @ThomasEmge,

I just had a look at the latest import results. As to the good news: the first run with the Netherlands was error free~! See the first image. This is of course all a bit preliminary as being based on a single run / observation, but this truly seems promising due to a number of other factors:

There just seems one minor issue left, see below this image of the successful run:

fix_geoprocessing

As to the minor issue I am seeing, it seems an explicit area=no tag is being ignored? I noticed it on this feature, which represents a closed way of barrier=hedge immediately to the right of the Rijksmuseum building, tagged with explicit area=no, which should have forced it into the Polyline table, but it ended up in the Polygon table instead.

Feature in iD with visible tags: fix_hedge_problem

Carto rendering: correctly displayed as line: fix_hedge_problem_osm

Incorrectly imported as closed way / polygon in ArcGIS: fix_hedge_problem_arcmap

mboeringa commented 7 years ago

Hi @ThomasEmge ,

Over the weekend, I have re-imported the Netherlands a second time, and also re-imported Poland without any issues. So that is good news.

As to the "number crunching", the Netherlands import finished in just over 10 1/2 hours again (using 4 cores), which is very consistent with the previous run. Poland, using 3 cores and using the same triple SSD setup, finished in 12 3/4 hour, which is also very consistent with the results for the Netherlands, as Poland and the Netherlands are almost the same size in terms of OSM XML file size, and number of nodes, ways and relations. So using 1 core less, the result seems utterly plausible. The resulting file geodatabase is also just under 10GB, which again seems normal.

As to the particular problem I reported above, regarding the ignoring of an area=no tag, a quick review using ArcMap Identify tool, indeed seems to confirm the issue is now fixed, the barrier=hedge feature is now in the Polyline, not the Polygon table.

There is just one - apparently completely benign - minor anomaly that I noticed during the processing and that I have seen before. Look at the attached image:

fix_hanging_python_process

Notice the highlighted python.exe process. It has 0% activity. This process is left over after the node processing finishes and doesn't get removed after that step. However, it doesn't seem to have any detrimental effect on the processing or stability, and once the OSM File Loader (load only) tool finishes completely, it disappears. So although a minor anomaly, it doesn't appear to have any consequences for the processing or use of the tool. I don't know if it actually happens always, but I do remember seeing it before.

Before I give the official "green light" on this one, I still would like to see the render results of the Netherlands. Both the Netherlands and Poland are currently being rendered, but due to the slow OSM Attribute Selector, it takes a couple of days to process them, mainly spend on extracting the attributes. I guess with the current implementation, rendering entire Europe could take a month... If anything, a faster OSM Attribute Selector is even higher on the priority list than a Pro version of the toolbox IMO.

Anyway, it all seems very hopeful, and I think we can close this issue in the coming days. I think many ArcGIS users will love the tool once the fixed version is released!

ThomasEmge commented 7 years ago

The shown python process might not even be originating from the geoprocessing tools but perhaps belong to the PyScripter process. I haven't noticed any outstanding exes during the node, way, and relation loading process but I'll keep an eye on it.

mboeringa commented 7 years ago

Yes, you may be right it is from another process, although I still have the vague feeling it is related to the tool. Anyway, as I wrote before, if it is related, it seems inconsequential as far as I can tell up to now, so it is likely safe to be ignored.

mboeringa commented 7 years ago

Hi @ThomasEmge ,

As a final confirmation that things seem to be OK now, a few screenshots of the problematic features as now rendered (the Netherlands just finished). All look OK now:

Koeienweide: final_koeienweide

Hedges and Rijksmuseum building: final_hedge

IJ and harbour: final_ij_and_harbour

mboeringa commented 7 years ago

Based on the latest import and rendering results of my ArcGIS Renderer using the Netherlands and Poland extracts of Geofabrik, I am now pretty sure the multipolygon issues are fixed. I will therefore provisionally close this issue. As always, I will keep an eye out for any anomaly, but this now really looks fixed, I don't really expect any overseen issues with the latest version of the OSM File Loader (load only) tool.