gis-ops / docker-valhalla

This is our flexible Docker repository for the Valhalla routing engine
MIT License
240 stars 70 forks source link

Error building tiles for whole North America #30

Closed ITninja04 closed 2 years ago

ITninja04 commented 3 years ago

I'm having the same issue trying to build the tiles using the latest container image.

= No valid bounding box or elevation parameter set. Skipping elevation! =
=========================================================================
Valhalla tiles not found!
Hash not found for: /custom_files/north-america-latest.osm.pbf!
Either valhalla_tiles.tar couldn't be found or new files or file constellations were detected. Rebuilding files:  /custom_files/north-america-latest.osm.pbf
PBF hashes: 0
PBF file Counter: 1
������=========================
= Build the config file =
=========================
������==========================
= Build the admin regions =
===========================
2021/08/29 21:59:25.187330 [INFO] Parsing files: /custom_files/north-america-latest.osm.pbf
2021/08/29 21:59:25.187393 [INFO] Parsing relations...
2021/08/29 22:04:46.976419 [INFO] Finished with 122 admin polygons comprised of 24308 ways
2021/08/29 22:04:46.976919 [INFO] Parsing ways...
2021/08/29 22:11:41.295182 [INFO] Finished with 9541 ways comprised of 879641 nodes
2021/08/29 22:11:41.295461 [INFO] Parsing nodes...
2021/08/29 22:13:28.595932 [INFO] Processed 500000 nodes on ways
2021/08/29 22:17:59.584373 [INFO] Finished with 869993 nodes
2021/08/29 22:17:59.594408 [INFO] SpatiaLite loaded as an extension
2021/08/29 22:17:59.690911 [INFO] Created admin access table.
2021/08/29 22:17:59.798115 [INFO] Created admin table.
2021/08/29 22:18:22.596814 [INFO] Inserted 108 admin areas
2021/08/29 22:18:22.786845 [INFO] Created spatial index
2021/08/29 22:18:22.801157 [INFO] Created Level index
2021/08/29 22:18:22.814766 [INFO] Created Drive On Right index
2021/08/29 22:18:22.828791 [INFO] Created allow intersection names index
2021/08/29 22:18:22.850177 [INFO] Done updating drive on right column.
2021/08/29 22:18:22.850496 [INFO] Done updating allow intersection names column.
2021/08/29 22:18:34.518128 [INFO] Done updating Parent admin
2021/08/29 22:18:34.518480 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for United States
2021/08/29 22:18:34.518528 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for United Kingdom
2021/08/29 22:18:34.518569 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Switzerland
2021/08/29 22:18:34.518610 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Russia
2021/08/29 22:18:34.518650 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Romania
2021/08/29 22:18:34.518690 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Poland
2021/08/29 22:18:34.518729 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Norway
2021/08/29 22:18:34.518769 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Spain
2021/08/29 22:18:34.518810 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Netherlands
2021/08/29 22:18:34.518851 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Hungary
2021/08/29 22:18:34.518891 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Slovakia
2021/08/29 22:18:34.518931 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for France
2021/08/29 22:18:34.518978 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Finland
2021/08/29 22:18:34.519019 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Denmark
2021/08/29 22:18:34.519059 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Brazil
2021/08/29 22:18:34.519099 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Turkey
2021/08/29 22:18:34.519139 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Belgium
2021/08/29 22:18:34.519178 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Belarus
2021/08/29 22:18:34.519218 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Austria
2021/08/29 22:18:34.519259 [ERROR] sqlite3_step() error: NOT NULL constraint failed: admin_access.admin_id.  Ignore if not using a planet extract or check if there was a name change for Australia
2021/08/29 22:18:34.520151 [INFO] Finished.
������==========================
= Build the timezone data =
===========================
downloading timezone polygon file.
Archive:  ./timezones-with-oceans.shapefile.zip
  inflating: dist/combined-shapefile-with-oceans.dbf  
  inflating: dist/combined-shapefile-with-oceans.prj  
  inflating: dist/combined-shapefile-with-oceans.shp  
  inflating: dist/combined-shapefile-with-oceans.shx  
SQLite version: 3.31.1
SpatiaLite version: 4.3.0a
Inserted 439 rows into 'tz_world' from './dist/combined-shapefile-with-oceans.shp'
1
������=========================
= Build the tile files. =
=========================
Running build tiles with: /custom_files/valhalla.json  /custom_files/north-america-latest.osm.pbf
2021/08/29 22:18:39.782649 [INFO] Start stage = initialize End stage = cleanup
2021/08/29 22:18:39.784511 [INFO] Parsing files for ways: /custom_files/north-america-latest.osm.pbf
2021/08/29 22:18:39.784656 [INFO] Parsing ways...
2021/08/29 22:31:46.640267 [WARN] Exceeded max speed for way id: 41007448
2021/08/29 22:31:46.640772 [WARN] Exceeded max speed for way id: 41007448
2021/08/29 22:32:14.803076 [WARN] Error during parsing of `layer` tag on the way 49396739: stoi
2021/08/29 22:39:04.764529 [WARN] Error during parsing of `layer` tag on the way 311307434: stoi
2021/08/29 22:40:11.851400 [INFO] invalid_argument thrown for condition (Jun 15-Labor Day)
2021/08/29 22:41:09.476634 [INFO] invalid_argument thrown for condition (Jun 15-Labor Day)
2021/08/29 22:43:30.993131 [WARN] Error during parsing of `layer` tag on the way 501009530: stoi
2021/08/29 22:43:30.993458 [WARN] Error during parsing of `layer` tag on the way 501009531: stoi
2021/08/29 22:43:30.993504 [WARN] Error during parsing of `layer` tag on the way 501009532: stoi
2021/08/29 22:43:31.005342 [WARN] Error during parsing of `layer` tag on the way 501017170: stoi
2021/08/29 22:43:31.005398 [WARN] Error during parsing of `layer` tag on the way 501017172: stoi
2021/08/29 22:43:31.005440 [WARN] Error during parsing of `layer` tag on the way 501017174: stoi
2021/08/29 22:43:31.005482 [WARN] Error during parsing of `layer` tag on the way 501017176: stoi
2021/08/29 22:43:31.005573 [WARN] Error during parsing of `layer` tag on the way 501017182: stoi
2021/08/29 22:43:31.005614 [WARN] Error during parsing of `layer` tag on the way 501017183: stoi
2021/08/29 22:43:31.005655 [WARN] Error during parsing of `layer` tag on the way 501017186: stoi
2021/08/29 22:43:31.013769 [WARN] Error during parsing of `layer` tag on the way 501024428: stoi
2021/08/29 22:43:31.013823 [WARN] Error during parsing of `layer` tag on the way 501024429: stoi
2021/08/29 22:46:00.763324 [WARN] Error during parsing of `layer` tag on the way 613045980: stoi
2021/08/29 22:46:00.763409 [WARN] Error during parsing of `layer` tag on the way 613045981: stoi
2021/08/29 22:46:26.760720 [WARN] Error during parsing of `layer` tag on the way 629286036: stoi
2021/08/29 22:48:07.142029 [WARN] Error during parsing of `layer` tag on the way 693084466: stoi
2021/08/29 22:49:22.314243 [WARN] Exceeded max speed for way id: 729106992
2021/08/29 22:49:22.314266 [WARN] Exceeded max speed for way id: 729106992
2021/08/29 22:50:17.191997 [WARN] Error during parsing of `layer` tag on the way 750672019: stoi
2021/08/29 22:51:15.375347 [WARN] Error during parsing of `layer` tag on the way 769745481: stoi
2021/08/29 22:52:34.757558 [WARN] Error during parsing of `layer` tag on the way 802339171: stoi
2021/08/29 22:52:48.353797 [WARN] Error during parsing of `layer` tag on the way 809560520: stoi
2021/08/29 22:53:51.645523 [WARN] Error during parsing of `layer` tag on the way 835943474: stoi
2021/08/29 22:56:17.766061 [WARN] Error during parsing of `layer` tag on the way 892668527: stoi
2021/08/29 22:56:17.766471 [WARN] Error during parsing of `layer` tag on the way 892668531: stoi
2021/08/29 23:00:09.512863 [INFO] Added 196511 culdesac roundabouts from 267346 candidates.
2021/08/29 23:00:09.513364 [INFO] Finished with 43595902 routable ways containing 464024464 nodes
2021/08/29 23:00:10.077051 [INFO] Sorting osm access tags by way id...
2021/08/29 23:00:10.241627 [INFO] Finished
2021/08/29 23:00:11.563508 [INFO] Parsing files for relations: /custom_files/north-america-latest.osm.pbf
2021/08/29 23:00:11.568703 [INFO] Parsing relations...
2021/08/29 23:08:15.696698 [INFO] invalid_argument thrown for condition  Jul 1- Aug 31 off
2021/08/29 23:08:19.674609 [INFO] Finished with 354779 simple restrictions
2021/08/29 23:08:19.674634 [INFO] Finished with 0 lane connections
2021/08/29 23:08:19.684642 [INFO] Sorting complex restrictions by from id...
2021/08/29 23:08:19.715298 [INFO] Sorting complex restrictions by to id...
2021/08/29 23:08:19.742803 [INFO] Finished
2021/08/29 23:08:19.745017 [INFO] Parsing files for nodes: /custom_files/north-america-latest.osm.pbf
2021/08/29 23:08:19.745122 [INFO] Sorting osm way node references by node id...
2021/08/29 23:10:44.892644 [INFO] Parsing nodes...
2021/08/29 23:36:23.108729 [INFO] Finished with 398942241 nodes contained in routable ways
2021/08/29 23:36:23.109249 [INFO] Sorting osm way node references by way index and node shape index...
2021/08/29 23:38:59.504767 [INFO] Finished: max_osm_id 9038921244
2021/08/29 23:38:59.504788 [INFO] Number of nodes with refs (exits) = 51297
2021/08/29 23:38:59.504792 [INFO] Number of nodes with exit_to = 13
2021/08/29 23:38:59.504795 [INFO] Number of nodes with names = 4469
2021/08/29 23:38:59.504798 [INFO] Number of way refs = 262817
2021/08/29 23:38:59.504801 [INFO] Number of reverse way refs = 22395
2021/08/29 23:38:59.504804 [INFO] Unique Node Strings (names, refs, etc.) = 5878
2021/08/29 23:38:59.504807 [INFO] Unique Strings (names, refs, etc.) = 2872622
2021/08/29 23:38:59.505477 [INFO] Creating graph edges from ways...
2021/08/29 23:39:07.171286 [WARN] Node 1600491595 in way 146813404 has not had coordinates initialized
2021/08/29 23:39:07.171310 [WARN] Do not add edge!
2021/08/29 23:39:09.598456 [WARN] Node 3732714612 in way 369491617 has not had coordinates initialized
2021/08/29 23:39:09.598480 [WARN] Do not add edge!
2021/08/29 23:39:18.652001 [INFO] Finished with 98302855 graph edges
2021/08/29 23:39:19.406563 [INFO] Sorting graph...
2021/08/29 23:40:16.940580 [INFO] Nodes processed. Sorting begin and end nodes by edge.
2021/08/29 23:41:00.195761 [INFO] Sorting begin and end nodes done. Populating edges...
2021/08/29 23:41:02.381244 [INFO] Finished with 76816658 graph nodes
2021/08/29 23:41:02.756483 [INFO] Writing tile manifest to /valhalla/scripts/valhalla_tiles/tile_manifest.json
2021/08/29 23:41:02.758975 [INFO] Reclassifying_V2 link graph edges...
2021/08/29 23:41:19.654862 [INFO] Class: 0 exit count = 68163
2021/08/29 23:41:19.654888 [INFO] Class: 1 exit count = 69201
2021/08/29 23:41:19.654891 [INFO] Class: 2 exit count = 104139
2021/08/29 23:41:19.654894 [INFO] Class: 3 exit count = 127869
2021/08/29 23:41:19.654897 [INFO] Class: 4 exit count = 73808
2021/08/29 23:41:19.654900 [INFO] Class: 5 exit count = 7071
2021/08/29 23:41:19.654902 [INFO] Class: 6 exit count = 18085
2021/08/29 23:41:19.654905 [INFO] Class: 7 exit count = 0

It's been on the last line above for around an hour and will eventually crash due to running out of RAM on the host.

My server has 128 GB of RAM and 8 cores, the container is allowed to use almost all of the RAM.

Here is my docker run command:

docker run -i -d -e use_tiles_ignore_pbf=True -e force_rebuild=False -e force_rebuild_elevation=False -e build_elevation=True -e build_admins=True -e build_time_zones=True -v /mnt/storage/valhalla:/custom_files --replace --privileged --name valhallabuilder -m 120g -p 8002:8002  gisops/valhalla:latest

I pulled the latest docker image before starting this run, so everything should be up to date.

Any idea on how to fix this issue?

nilsnolde commented 3 years ago

What OS are you on? Did you monitor your RAM usage while Valhalla is working? I routinely build Europe/the planet and don't have those issues on a 32 gb instance.

this is also a project wrapping another project, so it's often worth looking upstream as well, e.g. https://github.com/valhalla/valhalla/discussions/3288

j-bbr commented 3 years ago

I have the same issue with the latest docker image. Before it worked (4 cpu, 8 gb azure container instance, germany pbf file) and then kept crashing on startup both with building tiles and using pre-built tiles). Dropping down to 3.1.2 again fixed it for me.

nilsnolde commented 3 years ago

8 gb might be too little by now, there's a couple of tile building stages that consume quite a lot of memory. even 16 gb might not be enough to process the planet, 32 gb should be. generally I'd recommend 16 gb, even for "smaller" extracts like germany.

did anyone here record an actual error message from valhalla_build_tiles? so far I've only seen healthy logs

j-bbr commented 3 years ago

I can redeploy a bigger instance later and get the logs but the latest container also crashed when using prebuilt tiles (force_rebuild = false and use_tiles_ignore_pbf= true)

nilsnolde commented 3 years ago

what does "crashed" mean exactly?

j-bbr commented 3 years ago

I am not quite sure how Azure determines that a container has stopped responding and last thing in the logs was "core dumped...". Unfortunately I wasn't paying too much attention to it at the time, since what I was doing worked on the previous latest version so I just dropped down to that. I've started a container with 16 gb, "new" latest version and my previous setup that is building tiles now and once that's done I'll share the logs here

nilsnolde commented 3 years ago

ok cool thanks. if the host doesn't somehow destroy the failed containers, you can still run docker logs <container_name> to see the logs.

j-bbr commented 3 years ago

It puts the container in a restart loop and I automatically deleted the deployment a few days ago. I am sure the information is somewhere but seemed easier to retry with one possible error source eliminated

j-bbr commented 3 years ago

So here is what's happening in the log (4 cpu, 16 gb ram container now):

2021/08/30 11:10:22.475249 [INFO] --Reverse restrictions added: 6332
2021/08/30 11:10:22.483888 [INFO] Finished
2021/08/30 11:10:22.483942 [INFO] Validating, finishing and binning tiles...
2021/08/30 11:11:56.684896 [INFO] Finished
2021/08/30 11:11:56.684931 [INFO] Binning inter-tile edges...
2021/08/30 11:12:55.041802 [INFO] Finished
2021/08/30 11:12:55.041867 [WARN] Possible duplicates at level: 0 = 20
2021/08/30 11:12:55.041879 [WARN] Possible duplicates at level: 1 = 58
2021/08/30 11:12:55.041887 [WARN] Possible duplicates at level: 2 = 0
2021/08/30 11:12:55.082453 [INFO] Cleaning up temporary *.bin files within /valhalla/scripts/valhalla_tiles/
tar: Removing leading `/' from member names
Successfully built files:  /custom_files/germany-latest.osm.pbf
Hashing files:  /custom_files/germany-latest.osm.pbf
Hashing file: /custom_files/germany-latest.osm.pbf
2d171e8b75f3f4afb77671f8ebd572c794c55d87ec2a9852e535c328908826ea
Found config file. Starting valhalla service!
2021/08/30 11:14:15.473757 [INFO] Tile extract successfully loaded with tile count: 1127
2021/08/30 11:14:15.475480 [WARN] (stat):  No such file or directory
2021/08/30 11:14:15.475498 [WARN] Traffic tile extract could not be loaded
terminate called after throwing an instance of 'boost::wrapexcept<boost::property_tree::ptree_bad_path>'
  what():  No such node (service_limits.max_avoid_locations.max_matrix_distance)
/valhalla/scripts/run.sh: line 54:    59 Aborted                 (core dumped) valhalla_service ${CUSTOM_CONFIG} ${server_threads}
nilsnolde commented 3 years ago
terminate called after throwing an instance of 'boost::wrapexcept<boost::property_tree::ptree_bad_path>'
  what():  No such node (service_limits.max_avoid_locations.max_matrix_distance)

you'll need to remove your valhalla.json.. you can ask in the upstream repository to come up with a more explicit error telling you that your valhalla.json is not up-to-date (it does happen very frequently that new mandatory keys are introduced..)

note, that has nothing to do with the OP's problem. and the lesson being, always check logs!

ITninja04 commented 3 years ago

What OS are you on? Did you monitor your RAM usage while Valhalla is working? I routinely build Europe/the planet and don't have those issues on a 32 gb instance.

this is also a project wrapping another project, so it's often worth looking upstream as well, e.g. valhalla/valhalla#3288

CentOS 8 and MacOS 11.5 on the CentOS box I did watch and the RAM would creep to the 120 GB limit then it'd crash.

When I say crashing I mean the container terminates because it runs out of RAM, however the logs show this at the end:

2021/08/30 03:27:35.189607 [INFO] Class: 7 exit count = 0
/valhalla/scripts/configure_valhalla.sh: line 272:    19 Killed                  valhalla_build_tiles -c ${config_file} ${files}

There are tiles created in /valhalla/scripts/valhalla_tiles

-rw-r--r--. 1 root root    28864336 Aug 29 20:52 access.bin
-rw-r--r--. 1 root root    12599712 Aug 29 20:52 complex_from_restrictions.bin
-rw-r--r--. 1 root root    11892960 Aug 29 20:52 complex_to_restrictions.bin
-rw-r--r--. 1 root root  2359268520 Aug 29 20:52 edges.bin
-rw-r--r--. 1 root root  7946331400 Aug 29 20:52 nodes.bin
-rw-r--r--. 1 root root     2291891 Aug 29 20:52 tile_manifest.json
-rw-r--r--. 1 root root 22273174272 Aug 29 20:52 way_nodes.bin
-rw-r--r--. 1 root root  5580275456 Aug 29 20:52 ways.bin

But they are never copied to the custom_files directory, so when the container crashes that is usually lost.

image

The image above is htop output from the currently running docker container on my mac. As it progresses I can update with more screenshots.

nilsnolde commented 3 years ago

the RAM would creep to the 120 GB

and you're actually talking about RAM used, not the virtual RAM reserved right? not sure if docker doesn't get confused with the -m option maybe? can you try without?

There are tiles created in /valhalla/scripts/valhalla_tiles

there are no tiles yet. in fact, where it exits is the part it needs to do just before building actual tiles. I'm not super familiar with how the .bin files are loaded or scanned, but they are always mmap'd, meaning your virtual RAM can get > 120GB, but physical RAM never exceeds a few 10s of GBs. you're not working with some custom OSM PBF are you? (added/deleted/edited nodes/edges?)

lastly, I can only encourage you to build valhalla from source on centos (8 builds totally fine) and if you see the same issue there we know it's an upstream problem.

ITninja04 commented 3 years ago

and you're actually talking about RAM used, not the virtual RAM reserved right? not sure if docker doesn't get confused with the -m option maybe? can you try without?

The -m was there for podman (docker replacement for Centos) and that allows the container to use up to 120GB of physical RAM. I checked the logs and indeed the box was using that much physical RAM.

you're not working with some custom OSM PBF are you? (added/deleted/edited nodes/edges?)

correct, no custom PBF, just using the latest download for North America.

lastly, I can only encourage you to build valhalla from source on centos (8 builds totally fine) and if you see the same issue there we know it's an upstream problem.

That is what I am working on now, will keep you all posted.

nilsnolde commented 3 years ago

That is what I am working on now, will keep you all posted.

great thanks! hope that'll work.

somewhat sounds like a problem with podman to me.. never heard of those issues using docker so far

ITninja04 commented 3 years ago

somewhat sounds like a problem with podman to me.. never heard of those issues using docker so far

The same behavior happens on my mac running docker desktop with everything set as follows image

The docker run command I am using on my mac is:

docker run -i -d -e use_tiles_ignore_pbf=True -e force_rebuild=False -e force_rebuild_elevation=False -e build_elevation=True -e build_admins=True -e build_time_zones=True -v /Users/altiview/Desktop/valhalla_files:/custom_files --privileged --name valhallabuilder -m 47g -p 8002:8002 gisops/valhalla:latest
ITninja04 commented 3 years ago

The issue is still occurring when using north-america-latest.osm.pbf but works when using the united states only pbf.

The issue occurs while using the docker image you all have uploaded, along with building Valhalla from valhalla/valhalla so I can only assume it's something to do with older code OR the pbf is corrupt.

nilsnolde commented 3 years ago

thanks for the info!

The issue occurs while using the docker image you all have uploaded, along with building Valhalla from valhalla/valhalla

are you saying that both, our image and the current valhalla master, are showing the same behavior of OOM killed? what you describe makes me wonder if it has smth to do with either:

I think if you take care to not include edges/nodes that are crossing the antimeridian (like the US extract apparently) you should be fine. until further notice that is, I'm sure someone will very soon squash that bug!

I'll leave this open until it's resolved upstream

nilsnolde commented 3 years ago

@ITninja04 did you get this resolved?

nilsnolde commented 2 years ago

closing as stale, feel free to reopen

nilsnolde commented 2 years ago

Sorry, of course that’s probably still an issue, it’s not resolved upstream yet..

nilsnolde commented 2 years ago

according to #53 north america seems to build fine again