How can I instruct the container to only generate the tiles and don't import any POI data into the DB? If possible.

ghevge commented 1 year ago

Is it possible to instruct the container to only generate the tiles and don't import any POI data into the DB? No routing needed either. I just want a server that will serve tiles. Nothing else.

Istador commented 1 year ago

The POI data in the DB is used to generate the tiles. All that is imported into the DB is only there for generating tiles. There is no routing service included in this project.

ghevge commented 1 year ago

@Istador thanks for the explanation. I'm new to openstreetmap infrastructure realm, but in the short time that I had dealing with its BE components, I've got surprised of how slow the import process is. And monitoring the hardware components, it doesn't seem this slowliness is caused by hardware bottlenecks but more by a lack of parallelization.

Processing: Node(8646182k 1100.0k/s) Way(25045k 11.64k/s) Relation(0 0.0/s)

Like processing a node at 1 MB/s and Ways at 11 k/s when my CPU is only at ~ 6% load and disk IOs at bearly 10MB/s.

Do you know why parallelization is not utilized in this process, in order to make use of the full capabilities of the available hardware ? In case there are technical / physical limitations which are inhibiting the use of parallelization during import operation, why is the organization not providing instead a tarball with all the DBs populated and resources already compiled (at least for the bigst maps: world), which can be mapped directly to the docker container. Downloading a huge tar will be much faster on the majority of systems than waiting to compile everything from scrach.

Istador commented 1 year ago

Afaik the data contained in the PBF file is ALL data that OSM has (for that region). The import process only imports the subset of that data that is going to be used by the osm-carto style in use into a relational table structure that is specific and optimized for that style. There are other styles, that need different data and a different table structures and it might even change with different versions of the same style. So there's no one size fits all pre-build solution.

What organization are you talking about?

The not-for-profit OpenStreetMap Foundation?
The voluntary maintainers of the osm-carto style?
The voluntary maintainers of this docker image project?

If you see a need for these pre-populated tar files that have everything already imported, then please feel free to generate them, regularly update them for the new data, provide them for all the different styles in all the different versions, pay for the hosting costs, provide differential updates for them, etc.

(If you decide to do this, please first check if you are allowed to "just" do this, e.g. read the open source database licences. This is not legal advise.)

osm2pgsql is used to import the data from the PBF file into the postgres database. It is using parallelisation for the things that are parallelizable, the THREADS environment variable is passed to it (note: you can't set it too high, because the required memory scales with it, and you won't gain much by it), with the default being 4. The import is a complicated process, with high demands to storage and memory and not so much to the CPU as far as I understand it.

I think the best you can do to improve the import speed is to use flat nodes for very large regions or increase the cache (but not both).

I'm no expert in this. Please read the documentation or ask someone else.

ghevge commented 1 year ago

I'm setting both flat nodes and cache. In all the benchmarks I've seen on the net, they are also setting both flat nodes and cache. Why do you say that only one of those 2 has to be set?

https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks#What_affects_import_time?

As for creating the tarball, I guess it could to be any of those you mentined. On my side I was not enven able to import the planet.osm.pbf yet. I will see if I can tar it once I will be done with the import. As for mantaining it, won't it be enought to set the " -e UPDATES=enabled" while pointing to an old DB, for it to be updated automatically ? The tarball approach will be just a way to bootstrap an oms system much faster. Let me know what you think!

Istador commented 1 year ago

I'm setting both flat nodes and cache. In all the benchmarks I've seen on the net, they are also setting both flat nodes and cache. Why do you say that only one of those 2 has to be set?

https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks#What_affects_import_time?

Because:

You may also set --cache to 0 to disable caching completely to save memory. If you use a flat node store you should disable the cache, it will usually not help in that situation.

https://osm2pgsql.org/doc/manual.html#caching

And:

When importing with a flatnode file (option --flat-nodes), it is best to disable the node cache completely (--cache=0) and leave the memory for the system cache to speed up accessing the flatnode file.

https://osm2pgsql.org/doc/manual.html#notes-on-memory-usage

Afaik both is a cache for the same thing. --cache in memory and --flat-nodes on file (>80 GB). When using --flat-nodes reserving memory also for the cache, that isn't used, is missing memory that could be used for other things during the import.

As for creating the tarball, I guess it could to be any of those you mentined. On my side I was not enven able to import the planet.osm.pbf yet. I will see if I can tar it once I will be done with the import. As for mantaining it, won't it be enought to set the " -e UPDATES=enabled" while pointing to an old DB, for it to be updated automatically ? The tarball approach will be just a way to bootstrap an oms system much faster. Let me know what you think!

Updating (--append) takes orders of magnitude longer than importing (--create), so IMHO with that approach after a few months the update process might take longer than doing the full import.

Overv / openstreetmap-tile-server

How can I instruct the container to only generate the tiles and don't import any POI data into the DB? If possible. #397