Closed Rub21 closed 10 months ago
Regarding staging planet files: can you make them available for download somewhere, maybe?
I’m suspecting that it might contain some fairly large objects which cause Overpass to fail with a memory allocation error. This „.. killed“ error message would appear under such conditions. A non-empty database directory is also frequently causing issues with initial loads.
It’s easier to get a precise error message for the overpass import when it’s not reading data though a pipe command. Very large object ids could be an issue, but also files which are not properly sorted by node/way/rel and increasing object id. The latter two may not be relevant in your case, since the process worked before, but I would still recommend to validate the planet file using osmium tools to rule out similar issues.
Which Overpass version are you using at this time?
Regarding the stuck osmosis process: have you tried to trigger some stack traces? In osm production we’re using other tools than osmosis, both for planet generation, but also minutely diffs.
Thank you @mmd-osm
So, we seem to have two problems:
On staging, Overpass is not updating / doing a clean import. Am fairly certain we can figure out debugging that - it's also possible that we just have much smaller resource allocations setup for the staging Overpass that we'd now need to bump up. So here for next actions:
On production, generating full planet is failing. @Rub21 to confirm, on prod there's no issues with minutely diffs or overpass updates, it's just generating full planet using osmosis that's failing? Here, it sounds to me like it might make sense to move to using planet-dump-ng to generate full planet and history. This is probably a bit of work - as I understand it planet-dump-ng
does not talk directly to the database but works from a db dump file - which does seem better, but will just involve a bit of changes in how we generate planet. The OSM prod chef config is here.
@mmd-osm do you see any red flags with moving to using planet-ng-dump
to create the planet and history dumps? If not, I feel like I prefer going that route than trying to debug or upgrade osmosis.
+cc @geohacker
do you see any red flags with moving to using planet-ng-dump to create the planet and history dumps?
planet-dump-ng used to have some issues with very large relations that happened to have lots of versions (https://github.com/zerebubuth/planet-dump-ng/issues/25). I cannot completely rule out that the way objects are modeled in OHM, some other previously unknown issues with block size calculations might be triggered. I'd recommend to closely monitor planet-dump-ng runs for a while, and report any issues upstream.
I think the issue with planet replication in production has been solved. It looks like the process got stuck when the connection to the database was down, and the process got stuck there. Currently, we are accessing the recent planet files, for example: https://s3.amazonaws.com/planet.openhistoricalmap.org/planet/planet-240102_0000.osm.pbf, https://planet.openhistoricalmap.org/?prefix=planet/
Noted that the minutely files never stopped, but the daily full Planet replication was failing. Will make a new ticket for alerting on those.
Planet replication files have not been generated for a month in production, and there seems to be an issue with the replication of planet files in staging. This is why Overpass cannot complete the import process.
This issue seems to have arisen with changes to the cgimap changesets saves and/or recent updates to the API. I'm not entirely sure,:
A few weeks ago, we imported a production backup into staging to conduct some performance tests on the database. Perhaps this could be the problem with staging. However, in the case of production, it's still not clear. Maybe it just needs an update to the osmosis version. I have opened a ticket with osm-seed to upgrade the version of osmosis. https://github.com/developmentseed/osm-seed/issues/306
cc. @danrademacher @batpad