Esri / arcgis-osm-editor

ArcGIS Editor for OpenStreetMap is a toolset for GIS users to access and contribute to OpenStreetMap through their Desktop or Server environment.
Apache License 2.0
400 stars 132 forks source link

Continung the disfunction for 10.2.2.... #79

Open ReedBrian opened 9 years ago

ReedBrian commented 9 years ago

Loading an uncompressed bz2 file to .osm. Downloaded ArcGIS Editor for OpenStreetMap from ESRI, version 10.2.2. Installed only the x64 background processing tool for ArcGIS Desktop. In load for OSM file tools; received the following error message:

Load OSM File [002706_04172015] ... Counting elements in OSM file... Counted 644172 nodes, 65077 ways, and 1366 relations. Preparing geodatabase... Object reference not set to an instance of an object. at ESRI.ArcGIS.OSM.Geoprocessing.OSMGPFileLoader.Exedute(IArray paramvalues, Failed to execute (OSMGPFileLoader).

eggwhites commented 9 years ago

Hello - I assume that if you are using the x64 background processing for ArcGIS Desktop (a separate install on top of ArcGIS Desktop) you are also using the version of the ArcGIS Editor for OSM that is meant to work when that 64 background processing is installed (i.e., the "ArcGISEditor10_2_x64" folder when you unzip 10.2.2 installer). Assuming this is the case - Will you tell me a little more about the feature dataset to which you are loading the data? Is it a feature dataset in a file geodatabase, or an SDE database? Can you send me more info on the inputs to the "Load OSM File" tool so I can attempt to replicate? Thanks!

ReedBrian commented 9 years ago

Hey Christine,

We pulled a DE state file (bz2) from geofabrik and extracted it to .osm using 7-zip onto a VM Win7 local drive. Then ran ArcGIS Editor for OSM 64-bit background processing. Loading feature dataset (DE) into local drive file geodatabase (OSM.gdb), using Load OSM file. Tried to pull in pt, ln and ply features into the DE dataset (OSM.gdb). Then repeated with MD, District of Columbia sample sets each failed; unlikely each of those downloads were all corrupt files as test.

Thanks, Brian

Brian Reed, MCSE, CCNA, CCNP, CISSP Manager Geospatial & Applied Technologies Group

[PB Logo RGB] 100 S. Charles Street Baltimore, Maryland 21201-2727 410-385-4175 (office) 717.669.8770 (cell) 410.727.4608 (fax)

reedb@pbworld.commailto:reedb@pbworld.com

www.pbworld.comhttp://www.pbworld.com

From: Christine White [mailto:notifications@github.com] Sent: Friday, April 17, 2015 1:11 PM To: Esri/arcgis-osm-editor Cc: Reed, Brian Subject: Re: [arcgis-osm-editor] Continung the disfunction for 10.2.2.... (#79)

Hello - I assume that if you are using the x64 background processing for ArcGIS Desktop (a separate install on top of ArcGIS Desktop) you are also using the version of the ArcGIS Editor for OSM that is meant to work when that 64 background processing is installed (i.e., the "ArcGISEditor10_2_x64" folder when you unzip 10.2.2 installer). Assuming this is the case - Will you tell me a little more about the feature dataset to which you are loading the data? Is it a feature dataset in a file geodatabase, or an SDE database? Can you send me more info on the inputs to the "Load OSM File" tool so I can attempt to replicate? Thanks!

— Reply to this email directly or view it on GitHubhttps://github.com/Esri/arcgis-osm-editor/issues/79#issuecomment-94029228.


NOTICE: This communication and any attachments ("this message") may contain confidential information for the sole use of the intended recipient(s). Any unauthorized use, disclosure, viewing, copying, alteration, dissemination or distribution of, or reliance on this message is strictly prohibited. If you have received this message in error, or you are not an authorized recipient, please notify the sender immediately by replying to this message, delete this message and all copies from your e-mail system and destroy any printed copies.

mboeringa commented 9 years ago

Have you attempted to import this file without 64-bit background processing installed, and using the Load OSM File tool with the "Conserve Memory" option checked ON.

Unless you have a specific need for 64-bit geoprocessing, or have huge RAM available and really want to process everything in RAM, there is really no need to use it to import .osm files. I have successfully processed up to 24 GB uncompressed XML .osm files with just 32-bit geoprocessing. 64-bit geoprocessing is also not necessary for processing current OSM files with 64-bit OSM identifiers. The load Load OSM File tool will import properly with just 32-bit geoprocessing as far as I can tell.

However, make sure you have "Conserve Memory" checked on...

OneHwang commented 9 years ago

I saw an error message similar to the one reported by @ReedBrian while running ArcGIS 10.3.1 with the 64-bit version of OSM editor on my Windows 8.1 virtual machine. After reading the comments here, I tried uninstalling OSM Editor, and re-installing it with the 32-bit version. When I tried the "Load OSM File" tool with the bz2 file for Oregon downloaded from Geofabrik, ArcGIS worked! However, it took 4 hours and 18 minutes to process.

mboeringa commented 9 years ago

However, it took 4 hours and 18 minutes to process.

I am currently importing the whole of Europe, that is a 331 GB uncompressed *.osm XML file... It has now been running for about 125 hours, and has loaded 86% of all nodes, with > 1.3 billion nodes currently processed, so probably close to 1.5 billion total.

That comes down to just ~3k nodes / second, which indeed seems low (Core i5 desktop, 3GHz). Also the process step of building point indexes, takes a lot of time. After that, it will still need to process ways and multipolygons, the whole process could well take close to a month, based on previous experience with smaller data extracts.

Comparing these figures with some other benchmarks for other tools like osm2pgsql, it seems there may be considerable room for improvement. Although my initial thought was otherwise, importing using the Load OSM File tool is actually processor limited, not IO / harddrive limited. In fact, the disk usage of a 4TB harddrive capable of 150-180MB/s transfers, rarely reaches more than 1% percent active time during the node import process I am currently monitoring, an astoundingly low figure... while the ArcGIS related CPU core is at close to 100%.

It seems a code optimization is needed...

OneHwang commented 9 years ago

Wow, 1 month is a LONG time. If you have the means, the Data Interoperability extension may produce faster results - http://store.esri.com/esri/showdetl.cfm?SID=2&Product_ID=632&Category_ID=28 Unfortunately, the $2,500 price tag is outside my startup budget

mboeringa commented 9 years ago

Wow, 1 month is a LONG time. If you have the means, the Data Interoperability extension may produce faster results - http://store.esri.com/esri/showdetl.cfm?SID=2&Product_ID=632&Category_ID=28 Unfortunately, the $2,500 price tag is outside my startup budget

Luckily, we rarely have power outages here in the Netherlands. The process might actually finish faster, maybe within two weeks, but I'll have to see...

Data Interoperability is not an option for me, I need the specific database structure the Load OSM File tool of the ArcGIS Editor of OpenStreetMap creates.

Besides that, the Editor does some advanced data processing on multipolygon features, very specific stuff designed to handle the structure of OpenStreetMap data, which will be hard to replicate using the Data Interoperability extension, essentially resulting in the need for a complete refactoring, which would probably be costly (note: I am just a user of this tool, I don't work for ESRI).

Handling multipolygons is one of the toughest parts of dealing with OpenStreetMap data... The https://github.com/Esri/arcgis-osm-editor/issues/65 and https://github.com/Esri/arcgis-osm-editor/issues/72 issues I posted here (and were largely solved by ESRI, within the limits of the ambiguities of OSM multipolygons...), give some insight to this.

OneHwang commented 9 years ago

Hey @ajturner - Is there anything you can do to help optimize OSM Editor so that it doesn't take so long to load OSM data into ArcGIS?

ajturner commented 9 years ago

Thanks for pinging me @OneHwang - pinging @eggwhites @williamscraigm for thoughts as well on speeding up the OSM import indexing.

eggwhites commented 9 years ago

Thanks for the ping, and apologies on the radio silence on this. Good question on the optimization - yes, the code could use optimization. We are investigating internally team resources who can focus on improvements and optimizations; this is a slow conversation but one underway and definitely faster load times is often-requested. We hear you!

ThomasEmge commented 9 years ago

Here are some of the considerations that went into the architecture design the way it is right now.

As always 'performance' and 'speed' are loaded terms and you'll have to carefully compare apples to apples. Statements like "this tools is slow" aren't helping to find resolve the issue.

Looking at the overall architecture state of the desktop we sort have a mixed bag and for the sake of argument let's stay with ArcMap. The application itself is 32bit and can have a 64bit extension for geoprocessing. 32bit or 64bit doesn't really matter for pure computational speed, so let's take 'speed' aside for right now. The 64bit extension is optional so I would say the prime target is 32bit with a backing of getting the data into a file geodatabase. File geodatabase for the reason that we would really like to make it seamless for our users and fgdb is supported through the whole architecture. Shapefiles would be nice (for speed) but they top out at 2GB and as Marco discovered you can easily go beyond these limits. The file geodatabase is a single user container meaning only one (process) can write at a time. Again this was chosen as the target for convenience. On the hardware we are aiming for a machine with 2GB RAM and with an Intel Pentium 4 (because those are the minimum requirements for desktop at 10.0 - it doesn't mean it will run 'fast', it means it runs). Early tests indicated that loading IDs and so forth into memory isn't really an option for reasonable data chunks but you still see the leftovers from those trials as the "Conserve memory" option. Everyone is welcome to experiment with the option, but honestly just leave it at the default and life is good. So then the choice was made to keep the memory footprint low and increase I/O. Here is one of those hardware performance killers. For 'performance' we are certainly recommending, just like Geofabrik, get the fastest and biggest SSDs you can afford if you want performance. This really, really helps.

Indirectly answering your question about the data interop extension, Marco already alluded to it. Very few other software options handle closed ways as polygons. I was kind of baffled the first time looking at OSM that they don't have a dedicated entity to describe "polygons" or "rings". Assembling those polygons can be quite some detective work and my guess would be that we are roughly spending about a 1/4 of the loading time getting the points in, 1/3 to load the lines, and then another 1/3 to correct lines that are actually polygons and figuring out multi-parts. Now you might ask why going through all the trouble, I really want fast. And here we are hitting another design decision, we wanted to create an editor. Something that can read from and write back to OSM. And as we discovered that entails writing an import/export mechanism to put a schema less dataset into a relational database and get it out again while tracking what has changed along the way (and be able to fix any inconsistent state between server and client). That requires us to maintain some additional information that is not necessary OSM related but editor related and these things need to happen in sequence (partially due to the node topology). From what I have seen editors are usually dealing with a small chunks of data when performing edits and in a direct comparison I would say we are at performance par with other editing offerings.

Coming back to the 'speed' question. Are there solution out there that are faster? You bet, I am a huge fan of osmosis and would consider it the 'ultimate' OSM tool. I tried QGIS a couple of months ago was just flabbergasted by the speed it brought the data into the system. Then I realized that the polygons are mostly lines and incomplete - so that wasn't an option either (at least for me).

In summary: Is performance and speed a primary issue? I don't think so (see details above). Could the tools be faster? Yes.

Let's start collecting ideas and wishes for functionality you would like to see in version 2 of the editor at #94.

mboeringa commented 9 years ago

@ThomasEmge

Thanks very much for the detailed descriptions of some of the design considerations.

There are possibly a couple of things getting mixed up though:

Shapefiles would be nice (for speed) but they top out at 2GB and as Marco discovered you can easily go beyond these limits.

I have never been using shapefiles in the development of my Renderer, nor would want to use them. ESRI justly deprecated them long ago due to all the limitations.

I have only been using File Geodatabases, and recent experiments with SQL Server Express. The 2GB issue I mentioned in my ArcGIS Renderer issue tracker was related to ArcMap's incapability of opening and displaying very large attribute tables (the data format most likely doesn't matter here at all, and was a 300GB File Geodatabase in this particular case).

So then the choice was made to keep the memory footprint low and increase I/O.

I fully support this decision, as OSM data spans the globe and is ever growing. Any changes to the tools, especially the Load OSM File tool, must maintain the capability to run with low RAM. OSM, and most geographic datasets for that matter, are way past the point of "in-RAM" capabilities for the "average" Joe (OSM users and editors) with his home computer (which probably represents 99% of all OSM editors).

Here is one of those hardware performance killers. For 'performance' we are certainly recommending, just like Geofabrik, get the fastest and biggest SSDs you can afford if you want performance. This really, really helps.

This slightly baffles me. As I mentioned in my post above, I have the distinct feeling IO is not the current bottle neck of the Load OSM File tool of the ArcGIS Editor for OpenStreetMap, at least not in the node loading stage. I have probably imported some 75-100 osm XML files by now during testing for my ArcGIS Renderer, of varying sizes up to a 75GB one for France, and the experience seems like this one:

"Although my initial thought was otherwise, importing using the Load OSM File tool is actually processor limited, not IO / harddrive limited. In fact, the disk usage of a 4TB harddrive capable of 150-180MB/s transfers, _rarely reaches more than 1% percent active time during the node import process I am currently monitoring , an astoundingly low figure..._ _while the ArcGIS related CPU core is at close to 100%._"

Note again that this low figure was at the node import stage. I am still watching the Europe import, which hasn't finished yet. It finished loading the nodes, but is either still creating indexes, or processing the ways and relation. Currently disk activity is much higher, at about 40% active time. I can't tell at what stage of the process it is now, because ArcMap doesn't refresh at this stage. I do leave it running though, as I am pretty sure it will finish. Despite performance, the Load OSM File tool with memory conserving on, has proven reliable (and that is a great thing considering the not always so neat source data of OSM).

1/4 of the loading time getting the points in, 1/3 to load the lines, and then another 1/3 to correct lines that are actually polygons and figuring out multi-parts.

From my experiences, but it also depends on the nature of the dataset (datasets in developed countries usually have far more OSM (multipolygon)relations to process), I think it maybe closer to 1/2 loading points and creating point indexes, 1/4 loading ways/lines, 1/4 processing relations. But again this also highly depends on the dataset...

From what I have seen editors are usually dealing with a small chunks of data when performing edits and in a direct comparison I would say we are at performance par with other editing offerings.

Yes, probably, but for me personally, the editing workflow is of no consequences, as I concentrate on the rendering aspect with the ArcGIS Renderer. For that, it would be desirable to have a better performing Load OSM File tool.

mboeringa commented 9 years ago

I do leave it running though, as I am pretty sure it will finish. Despite performance, the Load OSM File tool with memory conserving on, has proven reliable (and that is a great thing considering the not always so neat source data of OSM).

Well, quoting myself regarding reliability, I do have a failure importing Europe. This already happened quite a while ago actually, about two weeks. It failed after being active on the "Building point indexes" step for almost a day, with a warning about the index not being found. See the attached image.

arcgis_editor - load_osm_file - europe_import_fail

I did do some basic checks: the Point feature class was about 350GB big by looking at the files in Windows explorer, so this is still well beneath the 1TB limit for Feature Classes in a File Geodatabase with default storage keyword. Also, the data was on a 4TB hard drive, with over 2TB free space left, so no doubts about storage. I therefore have no clue as to why it failed, although admittedly, I do not know if the point indexing algorithm also creates some huge intermediate files somewhere else (c: drive?) that may have hit some (disk) limitation.

I started a new import yesterday, and am curious if it will fail again, or succeed this time. We'll know in another two weeks...

Any indications by ESRI @eggwhites @ThomasEmge @ajturner if (Geofabrik) extracts the size of Europe are still importable by the tool into a File Geodatabase? Admittedly, Europe is by far the biggest chunk of OSM data currently. I did successfully import France, as the biggest country in Europe in terms of data, without issues.

OneHwang commented 9 years ago

@mboeringa It must have been frustrating to experience this failure after multiple weeks of processing, and I am afraid that the same thing will happen to me. Have you tried using osm2pgsql on large files, and then connecting ArcMap into the PostgreSQL database?

mboeringa commented 9 years ago

@OneHwang. Actually, it didn't fail after several weeks. It had loaded all the nodes, but failed on creating the point indexes one day into the process of building the indexes. As I wrote in one of my comments above, loading the nodes for Europe was at +/- 85% after five days, so 6 days in total for loading the nodes. So it failed after 1 week... (6 days node loading, one day building indexes).

Actually, my second attempt is now in its second day of building indexes. I can't see progress at this stage, so I don't know what exactly is happing, but it does seem to be past the point it failed the first time. I am keeping my fingers crossed and hope to see it starting to load ways in a (couple of) day(s)...

mboeringa commented 9 years ago

Have you tried using osm2pgsql on large files, and then connecting ArcMap into the PostgreSQL database?

That is not an option for me. Besides the fact that I don't have a PostgreSQL instance running (only SQL Server Express at the moment), I need the specific functionality of the ArcGIS Editor for OpenStreetMap to support the tools that I developed for rendering the data in ArcMap. I have been working on developing an ArcGIS Renderer for OpenStreetMap data, and the tools are completely depended on ESRI's Editor.

To see some of the latest rendering examples, you can have a look at these links: https://github.com/Esri/arcgis-osm-editor/issues/72 https://github.com/Esri/arcgis-osm-editor/issues/65

I gave a more general explanation of the project a year ago on the OpenStreetMap forums, but the A0(!) vector PDFs downloadable there are outdated as to current rendering, and contrary to what I wrote there about only supporting rendering of data in the topographic "large scale" scale range of about 1:1k to 1:50k, I have now developed the tool and associated layer files into a full multi-scale rendering from 1:1k up to 1:50M:

http://forum.openstreetmap.org/viewtopic.php?id=26451

Lastly, connecting to a PosgreSQL instance loaded with OSM data using osm2pgsql will need Query Layers. Query Layers have some restrictions, like not being editable, and are thus of no use to my renderer. I need the full geodatabase functionality, as provided by using the ArcGIS Editor for OpenStreetMap and the file or enterprise geodatabases it creates.

mboeringa commented 9 years ago

@OneHwang. Well, bad news again ;(. I now discovered it failed a second time with the same error and a failure to build the point index.

Clearly, Europe as a whole can no longer be reliably imported using the tool. I don't see an issue with disk space, nor it hitting the official limits of File Geodatabases (1TB max / Feature Class).

I might attempt adjusting the File Geodatabase storage keywords (http://resources.arcgis.com/EN/HELP/MAIN/10.2/index.html#//003n00000021000000), but I am not sure if it will make a difference. The MAX_FILE_SIZE_256TB seems an obvious candidate, but on the other hand the ArcGIS Help clearly states that you "You would normally only specify this keyword to store a large raster dataset", which isn't the case here. Unfortunately, the Help doesn't state if there are any pitfalls by using this keyword (besides the obvious disk space issue the Help warns for). And of course, as written just above in this same post, I have no real indication that I am actually hitting the 1TB Feature Class limit with DEFAULT configuration keyword for the file geodatabase.

Another candidate is setting the BLOB_OUTOFLINE keyword, as the tagging data is stored in a blob AFAIK. It might make a difference by distributing storage somewhat.

ThomasEmge commented 9 years ago

I think the error message might be misleading. Building indices can generate quite some data in your local temp directory. How much disk space do you have on the disk where the temp folder resides?

mboeringa commented 9 years ago

Hi Thomas,

I am currently concurrently importing Geofabrik *.osm files for both the whole of Germany and the Netherlands to File Geodatabases stored on non Windows system drives, which are physically different drives from system.

My C: drive, which is the Windows system drive, still has 58GB free of 260 (it is actually a 500 GB SSD, but dual boot Win7/8). The Germany import (which is the biggest chunk of OSM Europe data at the moment), is at the indexing stage, the Netherlands import has gone past it to loading ways.

I have no indication so far the C: drive space was limiting here, but admittedly, I am currently not importing the whole of Europe, so may need to check this again. I certainly did not get any Windows errors or warnings indicating problems with disk space. The drive I was importing to that would hold the database, was a 4TB external one, that I have successfully used for many other imports so far, and has 2TB free.

I would sincerely appreciate if ESRI could answer my question if you still manage to import the whole of Europe into a File Geodatabase at ESRI.

ThomasEmge commented 9 years ago

Marco, I attempted Europe last week and I noticed that during creating the index for the OSMID field 70GB files being generated in my temp folder. This is not the folder of your target gdb; it is the ArcMap temp folder. Those files are coming and going and unless you watch it during the indexing process you might have missed the OS warning and gp fails with this message. I'll test it once I have a complete set for Europe. Yesterday I started a new loading process for Europe, the nodes are done. I'll let you know when it completes.

mboeringa commented 9 years ago

Thanks Thomas,

Yes, you are right, this makes it very likely that I have been running into a disk space issue, especially since all other imports up to now have been OK.

Admittedly, being an SSD divided up for two operating systems, my system drive doesn't have unlimited space. I can probably free up another 70-100GB though on C:, as I have rendered OSM data in file geodatabases stored there too, that can be moved to a secondary hard drive.

Thomas: another question: You have said that using an SSD greatly speeds up the loading process of the Load OSM File tool.

Now on the one hand, I have never seen the performance monitor in Windows really showing my hard drive (no SSD) being taxed to the max, at least not in terms of MB/s. BUT, MB/s may not be a good indication of how the disk is being taxed.

On the other hand, I now noticed that the loading of the OSM _ways_ for the Netherlands, which entirely takes place on an SSD, seems much faster than for my Germany import, which uses an ordinary hard drive.

Does the loading of ways require a lot of back-and-forth reading through e.g. the nodes / points table to gather the required coordinates of the nodes / vertices making up the geometries of the ways? I guess this is likely, and might explain why loading ways is so much slower against a hard drive versus an SSD, the latter being much more suitable for random access.

ThomasEmge commented 9 years ago

Correct, assembling the ways requires a read from the OSM file, a read operation for the participating nodes from the point feature class and a write for the resulting feature. I am currently working on a way to separate these operations across drives and simplify the loading process. I am gathering numbers right now on how big the difference really is.

mboeringa commented 9 years ago

Correct, assembling the ways requires a read from the OSM file, a read operation for the participating nodes from the point feature class and a write for the resulting feature.

OK, thanks, this is close to how I figured the process worked... this definitely screams for SSD. Considering the slow Germany import of the OSM ways, I think I will refrain from trying to import entire Europe against a hard drive. Looking at the stats up to now, I guess even Germany alone will take another 2-4 weeks, let alone entire Europe. And then I still need to render the data!...

I am slightly contemplating getting a 1 or 2TB SSD, but haven't yet decided, as I am primarily testing my renderer, and it isn't of the highest priority. It would mainly be a geek / thrill thing to see entire Europe being processed by my renderer to stylized vector data.

mboeringa commented 9 years ago

I am currently working on a way to separate these operations across drives and simplify the loading process. I am gathering numbers right now on how big the difference really is.

Well, one observation I have now done may be of some interest here:

I had my Germany import running and at the loading ways stage, while at the same time doing heavy rendering using my ArcGIS renderer to process the Netherlands. The Germany import was importing to a File Geodatabase stored on an external 4TB hard drive. The Netherlands rendering session was using another external SSD for storing its data.

What I noticed? While the rendering session for the Netherlands extract was still going, importing proceeded with only some 5%-8% (in terms of the total number of ways) per day. That is slow, and would mean some 15 days processing just to load the ways.

Now, after rendering of the Netherlands finished, the process speeded up to some 60% per day. That is a huge difference, indicating the hard drive itself was not limiting here.

The *.osm file was stored on my internal C: drive, which is also my Windows system drive. This is a Samsung EVO SSD drive, that can not sustain more then maybe some 175 MB/s throughput in the long run.

It seems the rendering session, with all the geoprocessing going on, caused quite some congestion on C: and this SSD. Alternatively, or additionally, maybe there was congestion on the USB 3 pci express card serving the external drives, as it needed to handle both the hard drive and external SSD.

Based on this, I will re-try the Germany import in an optimized manner:

This should minimize the chance of congestion on IO somewhere (unless the concurrent use of the USB 3 pci express card turns out to have been the culprit of the slow processing)

mboeringa commented 9 years ago

Based on the configuration described in the above post, I now see it loading 200k nodes / minute for an extract of Norway, so about 3,5k nodes / s. Looking at hard drive activity, it seems to be writing some 800kb / s to the File Geodatabase's tables. Curious how it will do with loading the ways. If this is finished, I will attempt to load not Germany, but an even bigger extract of Germany, Switzerland and Austria, as available from Geofabrik.