Overv / openstreetmap-tile-server

Docker file for a minimal effort OpenStreetMap tile server
Apache License 2.0
1.2k stars 482 forks source link

Extremely slow first rendering #194

Closed Reygok closed 3 years ago

Reygok commented 3 years ago

Hi, so I followed the readme, imported Germany, Belgium and Thailand, took a few hours, but now I feel the pre-rendering takes way too long... To test this, I opened http://localhost/tile/14/8500/5500.png

Rendering this single tile takes 40 seconds. Seems kinda extreme to me. My specs: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 16 GB ram 450GB SSD

Is this normal or is there something wrong with my config?

Thanks in advance!

finisher0 commented 3 years ago

I am running into a similar (or the same) issue. I setup the tile server with a ting OSM small file (Andorra, 1.7 MB). In that configuration, tile loaded fairly quickly, with some lag for high zoom levels. Now I have the entire planet imported, but I can't load anything beyond zoom level 7. E.g. http://localhost:8080/tile/7/0/0.png works, but http://localhost:8080/tile/8/0/0.png fails. The request simply times out.

finisher0 commented 3 years ago

Update: Increasing the --shm-size seems to have helped (see the README). I can now load up to zoom level 9. Ten is iffy. I am using a super high shared memory value (--shm-size=10240) since I have ~60 GB of RAM and 16 CPUs.

I also edited the apache.conf values to increase the timeout interval: Timeout 300->3000 MaxKeepAliveRequests 100->1000 KeepAliveTimeout 5->50

I did this manually to avoid starting the planet import from scratch. I stopped the run, copied /etc/apache2/apache2.conf from my container to the host, made the changes, copied back to the container, and then launched the run again. I'm not sure this did much, because I still cannot load zoom levels 10+ reliably.

Any insight from others is welcomed!

Thanks

Istador commented 3 years ago

Any insight from others is welcomed!

Double check that the database is on a SSD and not on a HDD (named volumes are like most else relating to docker located under /var/lib/docker/ on linux).

You might want to tune the postgresql config too.

The current defaults are for smaller computers, so that they might run without errors, but it isn't optimized for having so many threads and memory available.

Isn't --shm-size=10240 only 10 KiB? "If you omit the unit, the system uses bytes." Try --shm-size=128M or something even higher (the default is 64m, but that's not big enough for the default postgresql config being used).

finisher0 commented 3 years ago

Thanks for insight @Istador! And you're right, I actually used --shm-size="10240m".

I should probably give a bit of background to my scenario (although it may not help @Reygok as much). My end goal is to simply generate a PNG tile database for a mobile application. So, I am not too concerned about the speed that the tiles are rendered and served. I just want to get the tiles! (Although faster is preferred since I will be doing this routinely). I am using 'curl' to send the socket requests (e.g. http://localhost:8080/tile/<zoom>/<x>/<y>.png). However, for high zoom levels the tile request fails (no image returned). For the same tiles, the browser shows the "Not Found The requested URL was not found on this server." error after attempting to load for several seconds. I assume this is a timeout issue, so I've manually (to avoid re-importing the entire planet) increased the Timeout, MaxKeepAliveRequests, and KeepAliveTimeout parameters in the apache.conf file, as mentioned above. Perhaps I am still missing something. Does the apache.conf file (stored in this repo) do more than what I mentioned if the following configures are modified?

    ModTileRequestTimeout 0
    ModTileMissingRequestTimeout 30

Any insight on this is would be very appreciated! Right now I can only generate about 1/6 of my database due to failed tile requests.

To answer your other questions/comments, my storage is an HDD. Perhaps this is causing issues? Additional reasons for increasing the timeouts?

I made the following changes to the postgresql config file based on the links referenced, but I'm not super familiar with these configs, so others may have some helpful insight!

# Suggested minimal settings from
# https://ircama.github.io/osm-carto-tutorials/tile-server-ubuntu/
shared_buffers = **2GB**
min_wal_size = 1GB
max_wal_size = 2GB
maintenance_work_mem = **1GB**
# Suggested settings from
# https://github.com/openstreetmap/chef/blob/master/roles/tile.rb#L38-L45
max_connections = 250
temp_buffers = 32MB
work_mem = **256MB**
wal_buffers = 1024kB
wal_writer_delay = 500ms
commit_delay = 10000
# checkpoint_segments = 60 # unrecognized in psql 10.7.1
max_wal_size = 2880MB
random_page_cost = 1.1
track_activity_query_size = 16384
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02

listen_addresses = '*'

I wonder if I need to increase the shared_buffers to match my --shm-size?

Thanks again!

finisher0 commented 3 years ago

Based on gravitystorms response here, it looks like I need to increase the ModTileMissingRequestTimeout, which is set in apache.conf. I was hoping that the modifying apache2.conf manually, as stated previously, would do the same thing, but it appears that the mod_tile timeout is separate from the Apache timeout. I'll test this. It may take a day or so to re-import the planet with the updated configs.

Istador commented 3 years ago

To answer your other questions/comments, my storage is an HDD. Perhaps this is causing issues? Additional reasons for increasing the timeouts?

That's likely your biggest slowdown. But it doesn't explain @Reygok's problem.

https://help.openstreetmap.org/questions/64925/hardware-configuration-for-a-production-tile-server-with-high-usage-and-serving-multiple-styles/64954 :

Rendering speed depends mostly on how fast your database disks are and how well engineered the style is. You cannot work without SSDs, and for a high end server you should consult with your vendor about what setup gives you the lowest latency and fastest random I/O access. A standard server with 64 GB RAM, 4 quad-core CPUs and fast SSDs with the standard OSM Carto style will give you a rendering performance in the general range of 1-5 metatiles per second (64-320 tiles per second). On lower zoom levels this performance will be much worse, and it is not unheard of for certain metatiles on z6 or z7 to take 120 seconds and longer to render, even if fast disks are available"

Which is why pre-rendering on lower zoom levels is important (I'd say 0-10), even when using an SSD.


I've manually (to avoid re-importing the entire planet) increased the Timeout, MaxKeepAliveRequests, and KeepAliveTimeout parameters in the apache.conf file, as mentioned above.

Apache's Timeout and KeepAliveTimeout values should be irrelevant, because these timeouts are for client communication only.


I wonder if I need to increase the shared_buffers to match my --shm-size?

shm_size reserves that space and sets an upper limit by that. As long as the shared_buffers is below that you should be fine. (Otherwise you'll face no space left on device error messages.)

(Note that the amount of shared memory used by postgresql isn't solely based on shared_buffers, and there are other processes than postgresql, so it's good to be somewhat below the upper limit set by shm-size)


What THREADS value are you running with?

You increased the work_mem from 128MB to 256MB. So keep in mind that the amount of memory used scales with the amount of active connections (max: 250). And the amount of active connections should scale with THREADS.

So, as long as THREADS is lower or equal than 32 you should be fine in theory (60 GB / 256 MB / 7).

Istador commented 3 years ago

I'll test this. It may take a day or so to re-import the planet with the updated configs.

Changing apache, mod_tile or even postgresql settings for performance, should never require a new import that was successful already. If you already imported the whole planet, you don't need to import it again (unless you forgot to put the database into a volume and throw away the container, or find out that the import was incorect).

Istador commented 3 years ago

It may take a day or so to re-import the planet with the updated configs.

a day or so for an import of the whole planet with an HDD seems really fast.

There was a report here for an import of Europe-only on an HDD that took 27 days.

finisher0 commented 3 years ago

Thank you for the feedback @Istador. Very helpful!

Here's my current run command:

docker run  \
    -p 8080:80   \
    -e THREADS=16   \
    -e "OSM2PGSQL_EXTRA_ARGS=-C 30720"   \
    --shm-size="10240m"   \
    -v openstreetmap-data:/var/lib/postgresql/12/main   \
    -d overv/openstreetmap-tile-server   \
    run

With 16 threads, based on the numbers above, I should be safe with the memory.

My first run (default configurations, no flat-nodes) took ~5 days. The second run (with 16 threads and 30GB RAM, using flat-nodes, which should slow things) took ~39 hours. These were both full-planet imports.

Unfortunately, I didn't create the create volume ("docker volume create openstreetmap-data") this time, so I will likely need to do one more import (had planned to anyway, for testing purposes).

In the meantime, any suggestions on how to update the ModTileResquestTimeout and restart the run? I've made a few attempts at it by stopping the run command, changing the config, and re-running, but it still times out at 30 seconds (the default).

Thanks again

finisher0 commented 3 years ago

Okay, I set the following in apache.conf and then rebuilt the in tile server from scratch :

ModTileRequestTimeout 600 ModTileMissingRequestTimeout 600

I expected this to give me a 10 min timeout for render (I know, super long, I would like to know how long I have to wait to ensure the tile gets rendered), but it did not -- the timeout is still 30 seconds. Is my expectation inaccurate? Since it is its own topic, I'll create a separate issue addressing this.

FYI, this planet import took slightly over 39 hours, similar to the last.

Reygok commented 3 years ago

I increased my servers RAM to 32 GB and the cores to 8, and continued the pre-rendering, I am at zoom level 16 now, and one meta tile takes 600+ seconds. I render only a specific area with render_list_geo, and I did the math on how long level 16 will take.... 700 days. Is this normal, and is there maybe an alternative? Maybe this project is not suited to my needs?