go-spatial / tegola-osm

Various scripts for importing and running a mirror of OSM with tegola
https://demo.tegola.io
MIT License
73 stars 26 forks source link

importing planet.pbf postgres optimisations n specs #38

Closed peldhose closed 6 years ago

peldhose commented 6 years ago

Hi , can somebody explain performance of tegola-osm to perform import of planet.pbf to postgres database ?? how much time it may take for plannet data to push into postgres??

ARolek commented 6 years ago

In my experience, with a pretty beefy AWS compute optimized instance (I don't recall how big, we should document this) with a high IOPS drive (the process is disk write intensive) we were able to get the data from the planent.pbf file into PostGIS with the generalized tables and the indexes in about 12-14 hours. It does depend on your config file but that's using the one we have documented in this repo.

peldhose commented 6 years ago

thankyou @ARolek for these info . .

peldhose commented 6 years ago

hey @ARolek , we imported planet osm within 11.30 hours around with a Google cloud server spec : 8 vCPU, 30GB RAM and 1TB ( 400GB only used ) Awesome performance Thank you for an simple awesome tileserver Opensource

peldhose commented 6 years ago

is there any redis support for better caching of tiles ? or any workaround to implement ???

ARolek commented 6 years ago

@peldhose nice! thanks for the performance numbers. I going to add those to the Readme.

tegola does not currently have Redis support. For caching tegola currently supports s3 and writing to a filesystem. Implementing a Redis cache would not be too difficult though. You can open an issue on the tegola repo for the request. If you're interested in tackling the implementation I can give you a tour of the way the Cacher interface works. Fairly straight forward.

peldhose commented 6 years ago

WOW ... Thanks bro. awesome.. Yes of course, I need to track redis caching implementation ( #40 ).Also, i would like to inform you that tegola performs FAR far better than Mapzen(tilezen) in all ways.Also like the way customizing layers using toml conf file. Mapzen took around 2.5days wit 8vcpu n 64GB ram to push entire data into database Meanwhile, tegola-osm took only 11.30 hrs with 8vcpu +30GB ram to push same data into database. also, Mazen is too slow for delivering tiles even with above spec server.They tried to solve this by giving 2 levels of caching (tilequeue s3,redis) but i don't think that is the right solution to that.

ARolek commented 6 years ago

@peldhose thanks for the positive feedback! We have plans to make tegola even faster. v0.6.0 is close to release which also comes with several rendering improvements. You can watch the tegola repo as new versions are released. We're going to need some help testing the pre-release if you're interested. ;-)

Thanks for chiming in with your import results.

adamakhtar commented 4 years ago

Hi @ARolek

As part of my PR to enhance the docs https://github.com/go-spatial/tegola-osm/pull/60 I wanted to expand on the "How long does it take to import" section. I found this issue and noticed your comment above:

In my experience, with a pretty beefy AWS compute optimized instance (I don't recall how big, we should document this) with a high IOPS drive (the process is disk write intensive) we were able to get the data from the planent.pbf file into PostGIS with the generalized tables and the indexes in about 12-14 hours. It does depend on your config file but that's using the one we have documented in this repo.

What do you consider to be a high IOPS on AWS? Can you give a figure? Would a high IOPS be required for both the RDS instance and the server's main volume? Typically how space does the DB need? And how much does the main volume need (for Imposm3 to prepare the data)?

If you can remember I'll update the PR.

ARolek commented 4 years ago

@adamakhtar I don't recall how high the IOPS were, I just remember I provisioning the IOPS. Ideally, you would have a high IOPS volume in your database as well.

Typically how space does the DB need? The database will need about 160GB if you don't use the import schema and just deploy production. If you want to use the import schema then you will need around 320GB, so I would suggest around 400GB to provide some padding.

And how much does the main volume need (for Imposm3 to prepare the data)? I don't recall this either, it's less than the database requirements though as it does not have the generalized tables or the indexes.

adamakhtar commented 2 years ago

I just tried to do an import with the suggestions made in the above comments and it went really slow. I'm troubleshooting now and will share what I have currently found in case anybody else is looking for ideal specs.

Imposm3's README states the following:

It's recommended that the memory size of the server is roughly twice the size of the PBF extract you are importing. For example: You should have 64GB RAM or more for a current (2017) 36GB planet file, 8GB for a 4GB regional extract, etc. Imports without SSDs will take longer.

So it seems if you choose a server will too little ram compared to your pbf's file size you are going to be bottlenecked by IO. A full planet pbf is now around 53gb, about 60~70% bigger than what it was at the time the above comments were made, so 100gb of ram now seems to be the recommendation.

I'll try again with more memory in a few days.

ARolek commented 2 years ago

@adamakhtar how slow was the planet import for you?

adamakhtar commented 2 years ago

@ARolek 12 hours later I was only 2% into the read phase. I assumed at that rate I was looking at least 48 hours to complete so I aborted.

You can see my full server spec, htop and imposm output here https://gis.stackexchange.com/questions/427821/steps-to-troubleshoot-slow-imposm-performance

Unfortunately I didn't consider IO to be the bottleneck at the time so never checked it but I'm assuming it's the problem.

I'll try again closer to the weekend but this time go for 16cpu and 124gb server.

ARolek commented 2 years ago

12 hours later I was only 2% into the read phase. I assumed at that rate I was looking at least 48 hours to complete so I aborted.

Wow that's insanely slow. Is this on the M1 mac? I wonder if Rosetta is being used. In my experience the x86 virtualization on the M1 is very slow.

adamakhtar commented 2 years ago

@ARolek no this was on an EC2 Intel Xeon instance (c6i.4xlarge) with 16 vCPU, 32 GB mem and 1000GB SSD storage.

I can only assume the 32gb ram was not enough for the 53gb planet data size and so IO became the bottleneck. I'll try again in a couple of days and will let you know how the rerun goes.

ARolek commented 2 years ago

@adamakhtar ok wow, that seems really odd. I will give this a run soon too. What version of imposm are you using?

adamakhtar commented 2 years ago

@ARolek I'm using version 0.11.1.