Esri / arcgis-osm-editor

ArcGIS Editor for OpenStreetMap is a toolset for GIS users to access and contribute to OpenStreetMap through their Desktop or Server environment.
Apache License 2.0
395 stars 129 forks source link

OSM File Loader Performance #181

Closed lchristianzumstein closed 5 years ago

lchristianzumstein commented 6 years ago

I'm working with the OSM File Loader tool to convert .osm continent data (https://download.geofabrik.de/) to .gdb and it's taking a really long time. I know the files are enormous, but I hope there's a way I can take better advantage of system resources to speed up the process. I seem to hover around 10% CPU utilization during the tool run, even though Arc's affinity settings are set to run on all available processors. I've set the OSM File Loader tool's parallel processing factor to 100%, but I know not all tools honor the PPF regardless of how it's set in the environment settings.

I was originally using unzipped .bz2 files from geofabrik, but found that by downloading the .pbfs, and converting to .osm using osmconvert instead (http://wiki.openstreetmap.org/wiki/Osmconvert), produced a .osm file about half the size of the unzipped .bz2. Ex: 75GB versus 140GB for Asia. I figured the smaller the better (I don't need history/update info anyway) would help load the data faster. The .osm file(s) reside on one drive and the .gdb is created on another drive.

Also I'm specifying: ('name', 'highway', 'building', 'natural', 'waterway', 'amenity', 'landuse', 'place', 'railway', 'boundary','power','leisure','man_made', 'shop', 'tourism', 'route', 'barrier', 'surface', 'type', 'service', 'sport', ‘historic’, ‘aeroway’, ‘aerialway’, ‘military’, ‘geological’, ‘ref’, ‘maxspeed’, ‘oneway’, ‘bridge’, ‘tunnel’, ‘population’, ‘capital’, ‘city’, ‘state’, ‘country’, ‘admin_level’, ‘border_type’, ‘alt_name’, ‘alt_name_1’ ‘loc_name’, ‘name_58_en’ ‘osm_is_in_58_country_code’, ‘osm_gnis_58_county_name’) as tag values in the tool parameters.

After I run the OSM File Loader, I run the OSM Attribute Selector on the keys above. Finally I query and export only the values I need from each continent's pt, ln, and ply features to individual feature classes in a new gdb which is a small fraction of the size of the .osm file and converted gdb. I'm ultimately building a WGS1984 cached map service for offline use.

I've converted Antarctica, Central America, Australia/Oceania, South America, and Africa. Africa, the longest completed conversion, took about 95 hours to run the OSM File Loader portion. I've currently been processing Asia since 12/13 and it began loading 'ways' at some point on 12/17 after loading nodes and building point indexes. I know that North America, and Europe are even larger. I guess what I'm getting at is, should I expect this process to take weeks to run? And/or is there anything I can do to increase speed?

It should be noted I'm working on a Windows Server 2012 R2 8-core machine, with 32GB Ram and ~3.2GHz max speed. I have ArcGIS 10.5 Desktop Basic License with 64-bit Background processing installed and enabled and the 64-bit OpenStreetMap Toolbox.

ThomasEmge commented 6 years ago

For large data loading processes, please take a look the OSM File Loader (Load only) tool. This is a tool to do some of the data loading in parallel and it does take advantage of the existing hardware resources. This tool can take certain 'shortcuts' as it is not tracking the full integrity of the metadata required to do a sync back with the OSM server. The load only tool is using the PPF environment setting and it is advised to place the scratch workspace and final dataset to different drive locations. Towards the end of the loading process the translation process is getting very I/O intensive to build the appropriate polygon structures. If I remember correctly with the load only tool, Africa should load and build in a little less than 12 hrs.

lchristianzumstein commented 6 years ago

Thanks for the tip! I wondered why there were two tools which on the surface appear to accomplish the same task. Will loop back with results.

lchristianzumstein commented 6 years ago

Ok. I was able to process the Asia dataset from OSM to GDB using the OSM File Loader (Load Only) Tool. It took about 122 hours to process. The tags I specified for points, lines, and polys were: ('name', 'highway', 'building', 'natural', 'waterway', 'amenity', 'landuse', 'place', 'railway', 'boundary','power','leisure','man_made', 'shop', 'tourism', 'route', 'barrier', 'surface', 'type', 'service', 'sport', ‘historic’, ‘aeroway’, ‘aerialway’, ‘military’, ‘geological’, ‘ref’, ‘maxspeed’, ‘oneway’, ‘bridge’, ‘tunnel’, ‘population’, ‘capital’, ‘city’, ‘state’, ‘country’, ‘admin_level’, ‘border_type’, ‘alt_name’, ‘alt_name_1’ ‘loc_name’, ‘name_58_en’ ‘osm_is_in_58_country_code’, ‘osm_gnis_58_county_name’). The tool counted 722,110,435 nodes, 83,145,913 ways, and 668,069 relations. However, my resulting pt feature class only has 12,735,642 features, while the ln and poly fcs respectively have 26,113,990 and 56,649,415 features. I believe when I had run the OSM File Loader tool on other Continent datasets, all of the points contained within the osm file were loaded into the pt FC in the resulting gdb regardless of tags/fields I specified to add/drop. I'm just worried that I only have about 1-2% of the point data loaded that I should.

ThomasEmge commented 6 years ago

I have never done any stats on nodes actually being stand-alone point features. My guess would have been somewhere between 5-8% but for Asia it could actually work. For Europe I would expect a higher percentage. My recommendation would be to perform some spot checks against the official OSM tiles.

lchristianzumstein commented 6 years ago

Thanks Thomas. I'll give that a shot! It's peculiar because previous OSM File Loader tool runs I've done before (regular not Load Only), on say Africa, counted nodes and loaded that same number as points in the resulting point feature class the tool outputs. I have begun loading North America using the (Load Only) tool using the same tags as above and it counted, loaded, and populated 942,850,791 points in the output point feature class. The tool ultimately failed after a successful completion of the Append tool run with a "Error HRESULT E_FAIL has been returned from a call to a COM component" similar to the issue discussed in this link. https://github.com/Esri/arcgis-osm-editor/issues/126

Sounds like I have one potential issue, and another! Thanks for all your help!

ThomasEmge commented 6 years ago

That is the place where in a load-only mode we can take 'shortcuts'. For the example the File Loader tool needs to keep everything around in order to determine the impact on all geometries. For example: if you are moving a vertex in a polygon, we need to keep track of the underlying 'node' as this is the information that needs to go back to the server - even if the node is 'only' used as a vertex in the polygon. Using the original tool we are carrying a lot of additional data around and for that we introduced the notion of 'supporting elements'. They are entities that are required to build higher-order geometries. The load-only tool is eliminating these 'supporting elements' as it is maintaining the same information but in a different form. As a result it is generating 'fewer' features which makes the point feature class for larger geographical areas a lot more manageable. If you were to load the same OSM file with the File Loader and the Load Only tool, then the number of features loaded by the load only tool and the number of features with the attribute "osmSupportingElement=no" from the file loader tool should match.

lchristianzumstein commented 6 years ago

Thanks for the clarification. That makes sense. Unfortunately that doesn't look to be the case for my tool runs. I ran the File Loader and the Load Only tools on Antarctica.osm and got differing results when querying the File Loader pt/ln/ply outputs for "osmSupportingElement=no" (1553, 32717, and 60035 respectively) and comparing them to the Load Only tool's pt/ln/ply outputs (6398, 28499, and 66274 respectfully).

mboeringa commented 6 years ago

@lchristianzumstein, As to the Append error, i have had occasionally similar experiences, but suspect this is a bug in the Append geoprocessing tool rather than an issue with the loader tool, where it sporadically fails at extremely large datasets, like these couple of 100`s million nodes to load for continent size extracts.

mboeringa commented 6 years ago

I also wouldn't worry tot much about "losing" data, i have used both of these tools myself dozens upon dozens of times to style and render data using a complex topographic style, and found the results to closely match the default tendering on openstreetmap.org, indicating the tools do a really good job in comparison with e.g. osm2pgsql.

scw commented 6 years ago

@mboeringa If there is an identifiable issue with the Append tool, we can make sure a bug is added for it. I realize that may be easier said than done with the volume of data you're working with, but its great we test with complex data to break things where we can.

mboeringa commented 6 years ago

@mboeringa If there is an identifiable issue with the Append tool, we can make sure a bug is added for it. I realize that may be easier said than done with the volume of data you're working with, but its great we test with complex data to break things where we can.

@scw

Shaun: fully agree with the observation that it is good to push ArcGIS geoprocessing (both in ArcMap & Pro) to its limits. With the increasing size of geographic datasets, like the OpenStreetMap database these tools use, using datasets with dozens, hundreds of millions, or potentially a few billion of records is no longer a rarity or exceptional obscure "corner case" for a GIS and certainly not for a geospatial enterprise databases like PostGIS, with the mere existence of the OSM render database running on PostGIS as a living testimony of that. GIS's and tools need to be able to handle such datasets as good as they can.

As to an "identifiable" issue, I can just give you all the information I posted in this other thread:

https://github.com/Esri/arcgis-osm-editor/issues/173

and maybe advice you to try it out yourself: use the OSM File Loader (Load only) tool of the ESRI ArcGIS Editor for OpenStreetMap on some of the continent size Geofabrik OSM extracts (https://www.geofabrik.de/), and see if it breaks.

The information in the above thread is the best information I can give you, and certainly makes it likely the issue is with the Append tool, especially see my comment in these two posts there:

https://github.com/Esri/arcgis-osm-editor/issues/173#issuecomment-304474863 https://github.com/Esri/arcgis-osm-editor/issues/173#issuecomment-321898095

Anyway, it remains a mild illusive issue, so if you can reproduce it, remains to be seen. I have loaded Africa recently successfully again using the tool.

ThomasEmge commented 6 years ago

I did the point comparison for Austria. The load only tool created a point feature class containing 1.528.038 vs 1.402.397 nodes out of a possible total 57.813.697 nodes. The difference is coming from a slightly different view of 'supporting elements'. The load only tool tests only for the existence of tags in order to call it a supporting element. The file loader goes one step further and checks for essential tags and the additional inspection results in a smaller number. But essentially even for Austria we are ending up with about 2% stand-alone nodes out of the overall number of nodes.

mboeringa commented 6 years ago

@scw:

Shaun, as to concrete bugs detected, I don't know if @ThomasEmge ever found the time to report and log this, but he mentioned an ArcGIS Runtime API call not working as expected in 64bit in this other thread post:

https://github.com/Esri/arcgis-osm-editor/issues/126#issuecomment-221124740

Unrelated to this, but thought I would mention it here as well.

ThomasEmge commented 6 years ago

@mboeringa The API call was related to determining the current installed ArcGIS environment. Unfortunately not related to the Append tool.

lchristianzumstein commented 6 years ago

@mboeringa Thanks for your input. The Append error is killing me right now. North America failed during Load Only in both foreground, as well as background 64-bit process in ArcMap. To confirm what you found in https://github.com/Esri/arcgis-osm-editor/issues/173#issuecomment-304474863 you were able to load successfully launching the tool in Catalog?

@ThomasEmge @mboeringa Ok awesome I feel better/understand more about the nodes now!

mboeringa commented 6 years ago

@lchristianzumstein

I can't guarantee it will not fail in ArcCatalog, but I would definitely try it. ArcMap just seems to carry a bit more load with it that may cause a failure.

In addition, I would recommend not to use the Windows system drive for loading. Although with another - custom build - tool, I have had some irregular failures when using the Windows system drive to do geoprocessing, where switching to another (potentially external USB) drive solved the issue and the tool continued without issue.

ThomasEmge commented 6 years ago

Yes, data volume is certainly an issue when loading the larger geographical extents. There are 3 locations to consider:

  1. The original osm file - the file itself can be quite larger but it will only be used in the beginning.
  2. The 'intermediate' files - which are the original file split by nodes, ways, and relation per load process. The location is determined by the setting of the scratch workspace to a folder location. The drive should have about 2.5 times the space available of the original osm file. This is only intermediate and the used volume fluctuates quite a bit. If no scratch workspace is set, then the intermediate files are generated next to the original osm file.
  3. The final feature classes - the total space for points, lines, and polygons should be about 2.5 times the size of the original file as well. The tables can grow to a substantial size depending on the number of attributes and indices.

I would recommend putting the 3 groups on different drives. As the loading process progresses the amount of I/O increases and spreading the data locations will lead to better throughput.

mboeringa commented 6 years ago

@ThomasEmge ,

While all good advices, that are definitely worth implementing, the specific potential issue I referred to regarding the Windows system drive, had little to do with lack of free disk space. All the times I ran into this specific issue, there were dozens or hundreds of GB free space left, even on the Windows system drive, everything run from SSD as well.

What resource is potentially running out using a Windows system drive in this particular heavy geoprocessing, I have no idea...

ThomasEmge commented 6 years ago

@mboeringa The resource constraint might be the computation of indices which is creating temporary files in the OS temp folder. Maybe as a guess I would suggest to have at least 2/3 of the original osm file size available on the system drive or maybe move your temp folder location at the OS level.

lchristianzumstein commented 6 years ago

@mboeringa I've started to run the process in Catalog. We'll see if the nodes process past the Append step. If not, I'll likely redownload North America from Geofabrik and try again, and be in the market for some USB externals as well. I assume the Load Only tool succeeded in running when you wrote the gdb to the external HD? Or did you need to store all 3 portions (osm, gdb, scratch) on externals? Just so odd that other File Loader and Load Only processes have completed without incident. Intermittent issues are so hard to diagnose/rectify.

@ThomasEmge Yep. I have the .osm files on one drive, writing the gdbs to another secondary drive, and my scratch workspace is my C:\ drive. Found out the hard way that sometimes if a process fails, the temp files aren't deleted/removed, and then re-running/re-starting the process fails due to lack of scratch space in that directory.

mboeringa commented 6 years ago

Or did you need to store all 3 portions (osm, gdb, scratch) on externals? Just so odd that other File Loader and Load Only processes have completed without incident.

@lchristianzumstein

To be honest, I have done loading in different configurations. I have even had multiple load processes run at the same time, in which case I have put all data for each session on its own drive, so no spreading for a single load session across drives like @ThomasEmge recommended. But those multiple session loads did not involve extremely large extracts, or at least not more than one at the same time. I have also followed Thomas's recommendations at times for spreading the load across drives, which indeed seemed to result in mildly more performance during loading, although I can't recall the few comparisons I did of the exact timing.

I also have not yet attempted to load an extract the size of North America or Europe. My biggest extract was France, which is actually bigger than the whole of Africa at current 5.4 GB bzip compressed XML, and just over a third of North America if I look at Geofabrik.

Intermittent issues are so hard to diagnose/rectify.

Yes, frustrating, especially since I haven't been able to provide a good test case to @ThomasEmge because of that and the size of the datasets involved, so it could be fixed.

ThomasEmge commented 6 years ago

Quick update....I doing loading exercises for the North American dataset. The COM component error seems to result from running out of disk space. My first try was 80GB available on my C: drive, where my temp folder resides, and it failed with "Error HRESULT E_FAIL has been returned from a call to a COM component". The second try was with 90GB of disk space available and that loading process is still running. So my current assumption is that the geoprocessing tool generating the attribute indices is running out of disk space.

lchristianzumstein commented 6 years ago

Thanks for the update Thomas. I did just get North America to complete the conversion from osm to gdb using the Load Only tool. Took about 5 days. My swapspace was on my C:/ drive and is 1.2TB in size. I have a feeling yours will error out again if storage is indeed what raises the COM issue.

mboeringa commented 6 years ago

@ThomasEmge and @lchristianzumstein

To get further confirmation about the possible cause of the load errors with Windows Temp folder space running out, I have now set both the Windows Temp and local user Temp folders to my SSD D:\ drive with over 1TB free space.

I will also use this drive to attempt to load North America.

Since I just started this, based on @lchristianzumstein , I will likely report back in another week if it finished successfully in this configuration.

mboeringa commented 6 years ago

@ThomasEmge and @lchristianzumstein

My attempt to load North America again failed with loading started from ArcMap, even with the Windows Temp and local user Temp folders pointing to my 2TB SSD D:\ drive with plenty of free space, see the second image showing hundreds of GB of free space left at the point where it failed.

Note that the issue is actually not related to the indexing step. It again fails half way during the call to the Append geoprocessing tool, after the nodes have been successfully loaded in the intermediate temporary File Geodatabases. I really think this issue may in fact be a core ArcGIS issue, either with the Append tool itself, or the way how ArcGIS handles the tool cancelling behavior in ArcMap. As you can see in the error message, it mentions the ITrackCancel & CancelTracker in the error message. Maybe these have issues with scalibility in ArcMap, with datasets of hundreds of millions of records, like these.

I will attempt this a second time with the tool started from ArcCatalog, as I did last year with a failed Geofabrik Africa extract, and see what it does there, and if it succeeds again as it did with the Africa extract when loaded in ArcCatalog instead of ArcMap: https://github.com/Esri/arcgis-osm-editor/issues/173#issuecomment-321898095

afbeelding

afbeelding

mboeringa commented 6 years ago

@ThomasEmge and @lchristianzumstein

Unfortunately, my load session of North America again failed with the same error at the Append step, even though loading from ArcCatalog instead of ArcMap... Do note that I can successfully load smaller Geofabrik extracts with the OSM File Loader (Load Only) tool.

There is one more check I will probably do, and that is to run a memory test to make sure there are no lingering hardware issues involved, but I doubt it.

afbeelding

mboeringa commented 6 years ago

@ThomasEmge and @lchristianzumstein , the RAM memory test (Windows 10) came out fine, as expected.

ThomasEmge commented 6 years ago

@mboeringa After this type of error message, the intermediate GDBs should still be around. Can you try to manually execute the append tool? So, after receiving the error message, close down the application (either ArcCatalog or ArcMap). Restart the application and try to just run the append tool with the points from the intermediate GDBs to the final feature class location.

mboeringa commented 6 years ago

@mboeringa After this type of error message, the intermediate GDBs should still be around. Can you try to manually execute the append tool? So, after receiving the error message, close down the application (either ArcCatalog or ArcMap). Restart the application and try to just run the append tool with the points from the intermediate GDBs to the final feature class location.

@ThomasEmge

I already did exactly what you described once before, and still ran into the Append issue with an "Unexpected error", see this post in another thread here on the issue tracker:

https://github.com/Esri/arcgis-osm-editor/issues/173#issuecomment-304474863

This is why I have repeatedly written that I suspect that the particular issue I am witnessing, may actually be an issue with the Append tool itself, not an issue triggered by or in the OSM File Loader (Load Only) code.

However, why you and @lchristianzumstein are able to load the North America extract, and I don't, I am not sure. It is definitely not a disk space issue, the SSD D:\ drive is a Samsung EVO 1.8 TB. The only difference left with @lchristianzumstein's setup, is that I did not have enough free disk space on the other two SSD based disks I have available to distribute the temporary working spaces and source files as you described. I may attempt this with one or two extra ordinary hard drives that do have enough space to store these, but that will be slower.

lchristianzumstein commented 6 years ago

I just ran into the Append issue again as well while loading Europe in Catalog. Same configuration as with North America: Reading raw OSM file from one drive, 1.2ish TB swapspace on C:/ and writing output gdb to 1TB+ secondary drive. I just installed a 4TB tertiary drive and plan to rerun the tool pointing my swapspace to that. Still looks like there should be room (600+GB) on my C:/ for more temp files...so like @mboeringa said it's seeming like this is an ArcGIS issue with the Append tool trying to handle so much data.

capture

lchristianzumstein commented 6 years ago

As expected, and as @mboeringa found out, trying to execute the append on the temp Europe_pts in swapspace failed as well. Any reason to use the Append tool versus Merge? (though I imagine similar results as well).

mboeringa commented 6 years ago

@lchristianzumstein , I really think ESRI should start trying to debug and fix the Append tool in case of these ultra large datasets, or find out what the limitation is.

Admittedly, 10-15 years or so ago, working with datasets this big, and with this many records per Feature Class, in a general GIS was rather unthinkable, but nowadays, and with the availability of OpenStreetMap, it isn't.

ESRI should really prepare for this, and not just via Big Data tools. PostGIS and PostGreSQL seem quite happy to deal with datasets of this size, the renderings of the data on the OpenStreetMap site being a living testimony.

Interesting question regarding the Merge vs Append. I think at least one valid reason to choose Append over Merge exists: with Append, you add the datasets to an existing feature class, and don't create a new Feature Class. This will save space, as the records for the base dataset to Append the records to, don't have to be duplicated in the new Feature Class. Considering the size of these datasets, and possible disk constraints, this may just be the difference between a failed or succeeded import.

Nonetheless, have you actually attempted to Merge these left-over Europe point datasets as well, to see if that succeeds?

lchristianzumstein commented 6 years ago

@mboeringa, agreed about ESRI updating the Append tool to account for the scale at which GIS does and will continue to operate going forward.

Good info about Merge vs Append. I ended up running a Merge on the data last night and ended up with a similar result (Error 999998), and the tool failed in about 3ish hours, the same as my previous Append run.

I'm planning to come up with a work-around....somehow...TBD. Will keep you aware of my findings.

ThomasEmge commented 6 years ago

As a quick question: How many nodes are the Europe extract?

lchristianzumstein commented 6 years ago

Hey Thomas, the Load Only tool counted 1,985,379,609 nodes.

mboeringa commented 6 years ago

@ThomasEmge and @lchristianzumstein

An alternative "workaround" that might be implementable in the OSM File Loader (Load Only) tool would be to not store the "intermediate" supporting nodes in the File or Enterprise Geodatabase, but use some optimized indexed binary file format similar to osm2pgsql's flat-nodes option.

Especially since dropping supporting nodes has become default (which I recommended also), storing them in the geodatabase awaiting final deletion at the end of the processing - just because they are needed to construct way and relation elements - is in a sense pretty huge demand from the database. This might circumvent the need to Append feature classes with billions of points, of which the majority in the end will be deleted.

https://wiki.openstreetmap.org/wiki/Osm2pgsql https://github.com/openstreetmap/osm2pgsql/issues/126#issuecomment-38635677

Of course, this option will require some serious development... and doesn't solve the potential issue with the Append tool itself and scalability of that tool.

mboeringa commented 6 years ago

Hey Thomas, the Load Only tool counted 1,985,379,609 nodes.

@lchristianzumstein and @ThomasEmge

That value starts to get awfully close to the maximum limit of ObjectIDs in ArcGIS (2,147,483,647), which, according to this Help page, are still based on 32bit integer values.

Clearly, it is no longer possible to load the entire planet in a File or Enterprise Geodatabase with all supporting nodes added. Dropping the supporting (untagged) nodes before inserting the remaining tagged nodes in the geodatabase, will eleviate this issue, but not definitely solve it.

ESRI should really start working on 64 bit integer ObjectIDs for File and Enterprise Geodatabases... Although the Help page is somewhat ambiguous, it doesn't state that if a database has an existing table with a "qualifying field" (integer, unique) that needs to be registered with the enterprise geodatabase, if that "qualifying field" may in fact be a field type that can store 64 bit integer, or not.

lchristianzumstein commented 6 years ago

Figured I'd loop back on this since it's been about a month. I was able to successfully download, unzip, and run the OSM (Load Only) tool on Europe data. I ended up having to do it by Country since doing so by Continent was just too heavy of a lift for ArcGIS.

I downloaded each country's .bz2 from the Geofabrik Europe page http://download.geofabrik.de/europe.html, except Russia because it's included in Asia data I've already processed, and extracted them to the same directory. image

Authored a python script that loops through a given directory and finds each '.osm' file within the subfolders and runs the OSM (Load Only) tool on it with provided node, ln, and ply key parameters and writes the resulting pt, ln, and ply feature classes to a geodatabase of that country name in the parent "country.osm" folder as shown above.

Ran another script that loops through all the geodatabases in that directory, and resources the pt, ln, and ply feature classes to pre-queried layers in an MXD based upon geometry type, then exports the results as feature classes to a Country_Queried.gdb. Ex: Highways, Populated_Places, Rail_Station, etc.)

Created another script that loops through all the '_Queried.gdb"s in the root directory, and adds each similar feature class's path from each to a list, then uses that list as the input parameter for a Merge to bring together all the Highways, Populated_Places, Rail_Station, etc. data into one overarching geodatabase called Europe_Queried.

You can then run Attribute Selector tool functionality on the resulting feature classes if required. Goes much faster than trying to do so on each country individually. Ask me how I know.

So ultimately I was able to process Europe data, but I had to start at the country level and break it down a bit so I could work back up to combine it into the data I needed. The uncompressed OSM files and their individual gdbs are about 800GB all together. The Europe_Queried.gdb containing only the data I require is about 80GB.

mboeringa commented 6 years ago

@lchristianzumstein

Thanks for the "loop back". Interesting exertion you did with that Europe data. I am actually amazed you managed to bring it back to just 80 GB...

My France rendering, which creates a File Geodatabase with quite a number of dedicated Feature Classes, but also uses the base pt, ln and ply FC you describe, is some 75 GB.

Admittedly, I am using a rich datamodel, with many attribute fields based on Wiki defined OSM keys extracted using the OSM Attribute Selector. In addition, and this may be a major cause of the large size, is the File Geodatabase fully indexed for keys used in SQL statements like the Definition Query of a layer, or label classes' SQL Queries. Indexes can certainly add a lot of extra GBs as well.

So I think it is combination of the rich datamodel with many attributes, and indexes + use of the base pt, ln, and ply feature classes, that causes the large physical size of the FGDB.

lchristianzumstein commented 6 years ago

Yeah it was a bit of an effort, but my small set of keys/tags for the about 12 feature classes I create probably helps keep the size of my final Queried.gdb small. My intermediary gdbs can be large (France for example is about 65GB prior to querying). Just takes a lot of time to ferret out the info I need.

elizasth commented 5 years ago

Hello

I am trying to download OSM data of Castellon , Spain but i could not download the data although castellon being small city it has more than 50,000 nodes. So i tried to download the data from geofabrik. I could not download small region therefore i had to download whole spain data and tried to use load OSM file. but i got the following error.

Exception from HRESULT: 0x80040301 at ESRI.ArcGIS.Geoprocessing.GPUtilitiesClass.IGPUtilities3_OpenDatasetFromLocation(String catalogPath) at ESRI.ArcGIS.OSM.GeoProcessing.OSMGPFileLoader.Execute(IArray paramvalues, ITrackCancel TrackCancel, IGPEnvironmentManager envMgr, IGPMessages message) Failed to execute (OSMGPFileLoader).

My target is to create a network dataset. there i cannot use "OSM file loader (Load only)

If you have any solutions.I would be grateful for your solutions.

ThomasEmge commented 5 years ago

@elizasth Make sure that your target feature classes are inside a file geodatabase. Shapefiles are not supported.

elizasth commented 5 years ago

@ThomasEmge Hey. I was trying to load the feature class inside file geodatabase. That's why it was not working. I keep my target inside file geotabase so now its working.

As well as geofabric didnt had subregions for Spain. So i downloaded OSM data form bbike .

Thank you for replying.

ThomasEmge commented 5 years ago

@elizasth Can you please post your full geoprocessing messages? In the geoprocessing results window, please right-click on the messages node and then select the Copy option from the context menu.

ThomasEmge commented 5 years ago

If this is still an issue, please reopen.