TimSC / pycrocosm

OSM API v0.6 implemented in Django/Python
7 stars 4 forks source link

Duplicate nodes appearing in planet dump #21

Closed TimSC closed 5 years ago

TimSC commented 5 years ago

From a recent dump fosm-planet_20181016020000.o5m.gz provided by @4x4falcon running this query:

SELECT * FROM planet2_static_livenodes WHERE id=1000593496579;

gives me this:

      id       | changeset  | changeset_index | username | uid  | timestamp  | version | tags | geom
---------------+------------+-----------------+----------+------+------------+---------+------+----------------------------------------------------
 1000593496579 | 1000071674 |                 | Rosscoe | 4467 | 1457757963 |       1 | {}   | 0101000020E610000032F499FDCBEB624059750C7ED15440C0
 1000593496579 | 2000001771 |                 | Rosscoe | 4467 | 1539733222 |       2 | {}   | 0101000020E61000009241EE22CCEB6240B22A6794D35440C0
(2 rows)

I suspect a problem with the dump tool...

TimSC commented 5 years ago
SELECT * FROM planet_mod_nodeids WHERE id=1000593496579;

      id
---------------
 1000593496579
(1 row)

postgres is 9.5. This seems to limit the problem to a few SQL queries:

SELECT "planet2_static_livenodes".*, ST_X(geom) as lon, ST_Y(geom) AS lat FROM "planet2_static_livenodes" LEFT JOIN "planet2_mod_nodeids" ON "planet2_static_livenodes".id = "planet2_mod_nodeids".id WHERE "planet2_mod_nodeids".id IS NULL ORDER BY "planet2_static_livenodes".id;
SELECT "planet2_mod_livenodes".*, ST_X(geom) as lon, ST_Y(geom) AS lat FROM "planet2_mod_livenodes" ORDER BY "planet2_mod_livenodes".id;

Each statement is apparently producing one row. Only the second actually should be.

TimSC commented 5 years ago

Not all relevant tables were being locked. Attempted fix in f8ce0f4b91cfc3c13740ea05a9fd005bbc950d97

TimSC commented 5 years ago

Latest version still has problems

ALTER TABLE "planet2_static_oldnodes" ADD PRIMARY KEY (id, version);
ALTER TABLE "planet2_static_oldways" ADD PRIMARY KEY (id, version);
ALTER TABLE "planet2_static_oldrelations" ADD PRIMARY KEY (id, version);
ALTER TABLE "planet2_static_livenodes" ADD PRIMARY KEY (id);
ERROR:  could not create unique index "planet2_static_livenodes_pkey"
DETAIL:  Key (id)=(1000593506563) is duplicated.

Timestamp of errant node seems to be about the time the dump was made.

TimSC commented 5 years ago

Error probably fixed by updating to latest pgmap. Reopen if problem occurs again.

TimSC commented 5 years ago

Fix is here: https://github.com/TimSC/pgmap/issues/4

Dump fosm-planet_20190501174901.o5m.gz worked without any duplicates :)