Open-Historical-Map-Labs / openhistoricaltiles

First iteration of vector tiles from OHM Planet data
BSD 2-Clause "Simplified" License
3 stars 0 forks source link

Determine if vector tiles properly consuming changes #56

Closed danrademacher closed 5 years ago

danrademacher commented 5 years ago

Recently added items in the water here: https://www.openhistoricalmap.org/#map=17/47.60175/-122.33600&layers=H image

AWS dev site: http://openhistoricalmap-1936150974.us-east-1.elb.amazonaws.com/#map=17/47.60134/-122.33781&layers=O A bit hard to tell what is happening here, since input data needs proper start and end dates to show up, so it could be there but not being rendered, though I tried scrolling through the whole time period and never got anything: image

And the "xray" view of tiles doesn't show these buildings either: https://vtiles.openhistoricalmap.org/#17/47.60133/-122.33506 image

A quick scan of the diff directory suggests they are being written properly: http://planet.openhistoricalmap.org/replication/minute/000/115/

daelba commented 5 years ago

The problem with last changes already arose in June. See #53.

danrademacher commented 5 years ago

Thanks for resurfacing that @daelba. I was away when you filed that and didn't realize we hadn't resolved it! Here's a good example of a feature with both start and end, https://www.openhistoricalmap.org/way/198283059, still not in our vector tiles: image

https://vtiles.openhistoricalmap.org/#18/49.54814/17.72894

So obviously a problem of longstanding, though we have seen successful change rendering in the past

danrademacher commented 5 years ago

Let's have @gregallensworth and @mojodna take a look and see if the diff consumption has stopped for some reason

gregallensworth commented 5 years ago

Was this due to some change made in the borders consumption some months back?

Logfile /home/ubuntu/OPENMAPTILES/openmaptiles/data/ohmdiffs_service.log

Starting June 11, it seems to have stopped working, due to a table missing.

[Jun 11 05:01:35] [INFO] [diff] Processing /import/diffs/000/091/952.osc.gz took: 21.78202ms
[Jun 11 05:01:35] [INFO] importing #91952 till 2019-06-11 05:00:02 +0000 UTC took: 22.205772ms
[Jun 11 05:01:35] [ERR] SQL Error: pq: relation "public.osm_border_polygon" does not exist in query INSERT INTO "public"."osm_border_polygon" ("osm_id", "geometry", "area", "admin_level", "name", "start_date", "end_date", "disputed", "maritime", "tags") VALUES ($1, $2::Geometry, $3, $4, $5, $6, $7, $8, $9, $10)
[Jun 11 05:01:35] [INFO] retrying in 2s

I also see a file dated June 10 /home/ubuntu/OPENMAPTILES/openmaptiles/ohm_20190610.pgdump which may indicate that the database was being reloaded around this time? Was this when border updates were being refactored?

Maybe the imposm3 that is running updates was not restarted, so is still running off an old openmaptiles.yaml ? I tried restarting the diffs consumer:

/home/ubuntu/OPENMAPTILES/openhistoricaltiles/ohmdiffs_service.sh stop
/home/ubuntu/OPENMAPTILES/openhistoricaltiles/ohmdiffs_service.sh start

However, this seems not to be working. I do see imposm3 running, but sleeping. The ohmdiffs_service.log logfile reads: [Sep 3 15:34:57] osm cache: IO error: lock /cache/coords/LOCK: Resource temporarily unavailable I am running through Google to find out what this means and how to fix it.

mojodna commented 5 years ago

Dang, that sounds about right. I was pretty sure that I checked whether replication was still working when I left off. osm_border_polygon should have been (was?) created when I did a re-import of the OHM planet with the updated OpenMapTiles schema. Is that a persistent error, or a one-time thing that happened on 6/11? (If the latter, it would have been part of setting things up.) That, or it's a typo and something needs to get pluralized somewhere...

@gregallensworth my guess is that it got killed indiscriminately and the LOCK file should be manually removed to clean things up.

gregallensworth commented 5 years ago

@mojodna

The error message about osm_border_polygon being absent, recurs a few times on June 11 over a few minutes, and that is the last log message. I saved the old log file as _ohmdiffsservice.log.old

I do see that osm_border_polygon exists however. So maybe imposm got hung up somehow during the process, and just quit working but without existing?

I found and deleted the lockfile, but so far no change in behavior:

/home/ubuntu/OPENMAPTILES/openhistoricaltiles/ohmdiffs_service.sh stop
sudo rm /var/lib/docker/volumes/openmaptiles_cache/_data/coords/LOCK
/home/ubuntu/OPENMAPTILES/openhistoricaltiles/ohmdiffs_service.sh start

I see that a imposm3 is running, but no output is showing up in ohmdiffs_service.log and the data/diffs/last.state.txt still shows 2019-06-11

mojodna commented 5 years ago

This disagrees about where it's replicating from:

$ docker exec -it openmaptiles_update-osm_run_1 bash
root@6ddb600c734a:/usr/src/app# cat /import/diffs/last.state.txt
timestamp=2019-06-11T05\:42\:27Z
sequenceNumber=3534082
replicationUrl=http://planet.openstreetmap.org/replication/minute/
root@6ddb600c734a:/usr/src/app# cat config.json
{
    "replication_url": "http://dump.openhistoricalmap.org/replication/minute/",
    "replication_interval": "1h"
}

The sequence number in last.state.txt is from OSM; OHM is currently at 116032. This suggests that we can resume without reloading once we're able to point to the correct replication endpoint.

danrademacher commented 5 years ago

From Seth on another channel:

good news: it looks like we should just be able to resume from 6/11 with some config changes. i need to run out now but will take a deeper look when i get back.

mojodna commented 5 years ago

We are replicating again.

last.state.txt was incorrectly initialized after running imposm3 on a 6/10/19 snapshot. It fetched the OSM state corresponding to its stop-point (why? not sure, might've been PBF metadata, might've been some sort of default).

To fix this, I stopped the diff service:

/home/ubuntu/OPENMAPTILES/openhistoricaltiles/ohmdiffs_service.sh stop

Then I poked around https://dump.openhistoricalmap.org/replication/minute/ looking for a diff that contained data from around the same time period. Fortunately, there was only one. Next, I skimmed through it looking for something I could id in the database:

curl https://dump.openhistoricalmap.org/replication/minute/000/091/714.osc.gz | gzip -dc | less

I used way 198280994 as a reference point and ran SELECT * FROM osm_all_buildings WHERE osm_id = 198280994 to see whether the data there matched what was in the diff. end_date differed, which allowed me to conclude that that sequence had not yet been applied.

I then replaced diffs/last.state.txt with the contents of https://dump.openhistoricalmap.org/replication/minute/000/091/713.state.txt (which is approximately what it should have been populated with had I initialized things properly).

Once that was in place, I restarted the diff service and starting tailing the logs:

/home/ubuntu/OPENMAPTILES/openhistoricaltiles/ohmdiffs_service.sh start
tail -f ~/OPENMAPTILES/openmaptiles/data/ohmdiffs_service.log

I periodically checked ~/OPENMAPTILES/openmaptiles/data/diffs/last.state.txt to check on progress.

As of now, it's caught up to https://dump.openhistoricalmap.org/replication/minute/000/092/905.osc.gz (6/14/19)

I'll keep an eye on things, but it looks like everything it behaving normally once again.

mojodna commented 5 years ago

Update: I checked the PBF metadata with:

$ osmium fileinfo ohm_planet_2019-06-10.pbf
File:
  Name: ohm_planet_2019-06-10.pbf
  Format: PBF
  Compression: none
  Size: 772302054
Header:
  Bounding boxes:
    (-180,-90,180,90)
  With history: no
  Options:
    generator=osmium/1.10.0
    pbf_dense_nodes=true

This leads me to believe that imposm3 used some sort of default to fetch a corresponding sequence number rather than relying on invalid metadata.

danrademacher commented 5 years ago

Visibly looking caught up on the tiles: image

and

image

danrademacher commented 5 years ago

looks like we're all back in good shape