apache / incubator-baremaps

Create custom vector tiles from OpenStreetMap and other data sources with Postgis and Java.
baremaps.apache.org
Apache License 2.0
516 stars 62 forks source link

Handle the Out of Memory Errors #681

Closed devdattaT closed 12 months ago

devdattaT commented 1 year ago

I am trying to import an OSM Extract of a large Metropolitan area, that I already have downloaded, and ingesting it into my PostGIS Database, using the following workflow file:

{ "steps": [ { "id": "import", "needs": [], "tasks": [ { "type": "ImportOpenStreetMap", "file": "kol.osm.pbf", "database": "jdbc:postgresql://localhost:5432/basemaps?&user=DevdattaTengshe", "databaseSrid": 3857 } ] }, { "id": "index", "needs": [ "import" ], "tasks": [ { "type": "ExecuteSql", "file": "indexes.sql", "database": "jdbc:postgresql://localhost:5432/basemaps?&user=DevdattaTengshe" } ] } ] }

When I try to import this file into my database, with the command: baremaps workflow execute --file kol.json I get the following error

[INFO ] 2023-05-29 14:46:44.729 [main] Execute - Executing the workflow kol.json [INFO ] 2023-05-29 14:46:44.842 [pool-2-thread-1] ImportOpenStreetMap - Importing kol.osm.pbf into jdbc:postgresql://localhost:5432/basemaps?&user=DevdattaTengshe [INFO ] 2023-05-29 14:46:44.847 [pool-2-thread-1] HikariDataSource - HikariPool-1 - Starting... [INFO ] 2023-05-29 14:46:44.898 [pool-2-thread-1] HikariPool - HikariPool-1 - Added connection org.postgresql.jdbc.PgConnection@25fd651a [INFO ] 2023-05-29 14:46:44.899 [pool-2-thread-1] HikariDataSource - HikariPool-1 - Start completed. java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) at org.apache.baremaps.cli.workflow.Execute.call(Execute.java:48) at org.apache.baremaps.cli.workflow.Execute.call(Execute.java:29) at picocli.CommandLine.executeUserObject(CommandLine.java:1953) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at org.apache.baremaps.cli.Baremaps.main(Baremaps.java:83) Caused by: java.lang.OutOfMemoryError at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:484) at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542) at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567) at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:765) at org.apache.baremaps.workflow.tasks.ImportOpenStreetMap.execute(ImportOpenStreetMap.java:141) at org.apache.baremaps.workflow.tasks.ImportOpenStreetMap.execute(ImportOpenStreetMap.java:100) at org.apache.baremaps.workflow.WorkflowExecutor.lambda$initStep$3(WorkflowExecutor.java:97) at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:787) at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1589) Caused by: java.lang.OutOfMemoryError: Cannot reserve 1048576 bytes of direct buffer memory (allocated: 8589580449, limit: 8589934592) at java.base/java.nio.Bits.reserveMemory(Bits.java:178) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:128) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:342) at org.apache.baremaps.collection.memory.OffHeapMemory.allocate(OffHeapMemory.java:40) at org.apache.baremaps.collection.memory.Memory.allocate(Memory.java:94) at org.apache.baremaps.collection.memory.Memory.segment(Memory.java:78) at org.apache.baremaps.collection.MemoryAlignedDataList.write(MemoryAlignedDataList.java:82) at org.apache.baremaps.collection.MemoryAlignedDataList.addIndexed(MemoryAlignedDataList.java:89) at org.apache.baremaps.collection.DataList.add(DataList.java:40) at org.apache.baremaps.collection.MonotonicDataMap.put(MonotonicDataMap.java:87) at org.apache.baremaps.collection.MonotonicDataMap.put(MonotonicDataMap.java:34) at org.apache.baremaps.openstreetmap.function.CoordinateMapBuilder.accept(CoordinateMapBuilder.java:41) at org.apache.baremaps.openstreetmap.function.CoordinateMapBuilder.accept(CoordinateMapBuilder.java:24) at java.base/java.util.function.Consumer.lambda$andThen$0(Consumer.java:65) at java.base/java.util.function.Consumer.lambda$andThen$0(Consumer.java:65) at java.base/java.util.function.Consumer.lambda$andThen$0(Consumer.java:65) at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) at org.apache.baremaps.openstreetmap.function.BlockEntitiesHandler.accept(BlockEntitiesHandler.java:44) at org.apache.baremaps.openstreetmap.function.BlockEntitiesHandler.accept(BlockEntitiesHandler.java:25) at org.apache.baremaps.stream.ConsumerUtils.lambda$consumeThenReturn$1(ConsumerUtils.java:46) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at org.apache.baremaps.stream.BufferedSpliterator.tryAdvance(BufferedSpliterator.java:74) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298) at org.apache.baremaps.stream.BatchedSpliterator.tryAdvance(BatchedSpliterator.java:48) at org.apache.baremaps.stream.BatchedSpliterator.trySplit(BatchedSpliterator.java:59) at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:289) at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

bchapuis commented 1 year ago

I havn't encountered this error before. Does the kol.osm.pbf file come from geofabrik? Baremaps uses large memory mapped files to construct the geometries and it looks like this workflow hits some memory allocation limit. Could you also provide more information about your environement (OS, RAM available, etc.)?

devdattaT commented 1 year ago

This is a custom file, created by getting the Extract from geofabrik, and adding additional data (Buildings and POIs) from proprietary sources.

A possible cause of the error could be due to these additional elements having large Ids. Running osmium's fileinfo gives the following:

File: Name: kol.osm.pbf Format: PBF Compression: none Size: 42727714 Header: Bounding boxes: With history: no Options: generator=osmium/1.15.0 pbf_dense_nodes=true pbf_optional_feature_0=Sort.Type_then_ID sorting=Type_then_ID [======================================================================] 100% Data: Bounding box: (86.5196907,21.7443002,88.9361347,25.0160464) Timestamps: First: Last: 2023-05-22T16:39:43Z Objects ordered (by type and id): yes Multiple versions of same object: no CRC32: not calculated (use --crc/-c to enable) Number of changesets: 0 Number of nodes: 6056375 Number of ways: 1188524 Number of relations: 1667 Smallest changeset ID: 0 Smallest node ID: 239677149 Smallest way ID: 22328118 Smallest relation ID: 49602 Largest changeset ID: 0 Largest node ID: 5000329540503 Largest way ID: 3000040486463 Largest relation ID: 3000000002836 Number of buffers: 9140 (avg 792 objects per buffer) Sum of buffer sizes: 584050072 (0.556 GB) Sum of buffer capacities: 599457792 (0.571 GB, 97% full) Metadata: All objects have following metadata attributes: none Some objects have following metadata attributes: version+timestamp

Do note that this file works with other tools such as osm2pgsql, tilemaker & planetiler

bchapuis commented 1 year ago

Interesting use case, I knew that pbf files were used for data exchange beyond OpenStreetMap, but I never experienced with this personnally. Does the file weight around 42 MB? Do you know where I could find a similar file? if you don't mind sharing this file privately, being able to reproduce the error would be of great help.

devdattaT commented 1 year ago

I can share a similar file with you , which I created just now, containing data from GeoFabrik, and Buildings added from Microsoft Buildings. I have checked, and I get a similar out of Memory Error with this File. The file size of this new file is about 62 Mb, and I can share it with you privately as a Google Drive link. I have shared it on the Gmail address present on your Profile.

bchapuis commented 1 year ago

This is great, thank you!

bchapuis commented 1 year ago

I think I spotted the origin of the issue. The keys of the additional object start at a much greater index than the usual object. The MonotonicDataMap that stores coordinates partition the key space with chunks and the number of chunks in between the last osm key and the first additional object key does not fit into memory. We will probably need to devise a new data structure to address this issue.

last osm key: 10935151878 first additional object key: 5000028797831