OvertureMaps / data

Overture Maps Data
https://docs.overturemaps.org
951 stars 37 forks source link

Duplicate region entities #187

Open stepps00 opened 1 month ago

stepps00 commented 1 month ago

Reviewing boundary_area entities under the region subtype in the most recent release, I see there are a handful of duplicates. It looks like data from both geoBoundaries and OpenStreetMap are being used to populate the region subtype in Tuvalu, which is the only country affected by this issue.

Running the following query

SELECT id, names.primary as name, sources[1].dataset as source
  FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=divisions/type=division_area/*', filename=true, hive_partitioning=1)
  WHERE country = 'TV'
  AND subtype = 'region';

results in this list of entities

┌──────────────────────────────────┬────────────┬───────────────┐
│                id                │    name    │    source     │
│             varchar              │  varchar   │    varchar    │
├──────────────────────────────────┼────────────┼───────────────┤
│ 085978a5bfffffff01ab63e5e4ccdeb6 │ Nukulaelae │ OpenStreetMap │
│ 0857a0477fffffff0143126ec50b5e49 │ Nukulaelae │ geoBoundaries │
│ 085d0818bfffffff015b09c4aeec0c5d │ Funafuti   │ OpenStreetMap │
│ 085d0816bfffffff0126b9cad7c51550 │ Funafuti   │ geoBoundaries │
│ 085e0a253fffffff01ae173137d8feae │ Nui        │ OpenStreetMap │
│ 085e0a6bbfffffff013ca4206d51c799 │ Nui        │ geoBoundaries │
│ 0856a8367fffffff0185d44be6397214 │ Nanumanga  │ OpenStreetMap │
│ 0856a8367fffffff013dd2848619d4f1 │ Nanumanga  │ geoBoundaries │
│ 085384113fffffff01a76b63451086e0 │ Nanumea    │ OpenStreetMap │
│ 085384c6bfffffff019582a3c743358d │ Nanumea    │ geoBoundaries │
│ 085ee225bfffffff01a43b95413c55e8 │ Nukufetau  │ OpenStreetMap │
│ 085ee26abfffffff01f0958de4a57f5d │ Nukufetau  │ geoBoundaries │
│ 085ec268bfffffff0138762785931dbe │ Vaitupu    │ OpenStreetMap │
│ 085ec268ffffffff01e8804d9e827960 │ Vaitupu    │ geoBoundaries │
│ 08561ac2ffffffff01fa933a87f594e5 │ Niutao     │ OpenStreetMap │
│ 08561ac2ffffffff019c40939062a640 │ Niutao     │ geoBoundaries │
├──────────────────────────────────┴────────────┴───────────────┤
│ 16 rows                                             3 columns │
└───────────────────────────────────────────────────────────────┘

Geometries of the geoBoundaries-sourced entities are clipped to land, OpenStreetMap geometries are not. Example of Nanumea:

geoBoundaries

Screenshot 2024-07-22 at 3 48 17 PM

OpenStreetMap

Screenshot 2024-07-22 at 3 47 57 PM
stepps00 commented 1 month ago

And here are the divisions (Point) entities:

SQL

SELECT id, names.primary as name, sources[1].dataset as source
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=divisions/type=division/*', filename=true, hive_partitioning=1)
WHERE country = 'TV'
AND subtype = 'region';

Resulting in

┌──────────────────────────────────┬────────────┬───────────────┐
│                id                │    name    │    source     │
│             varchar              │  varchar   │    varchar    │
├──────────────────────────────────┼────────────┼───────────────┤

│ 085978a5bfffffff01645a3d2a102f75 │ Nukulaelae │ OpenStreetMap │
│ 0857a0477fffffff018f88efd093aa72 │ Nukulaelae │ geoBoundaries │
│ 085d0818bfffffff01dee2f423eb079f │ Funafuti   │ OpenStreetMap │
│ 085d0816bfffffff01d05fa0d988cf80 │ Funafuti   │ geoBoundaries │
│ 085e0a253fffffff0169964eb8ab2902 │ Nui        │ OpenStreetMap │
│ 085e0a6bbfffffff01495198508f1004 │ Nui        │ geoBoundaries │
│ 0856a8367fffffff018095e009f8359e │ Nanumanga  │ OpenStreetMap │
│ 0856a8367fffffff01b64b7a314a50f9 │ Nanumanga  │ geoBoundaries │
│ 085384113fffffff018ab4531be983b6 │ Nanumea    │ OpenStreetMap │
│ 085384c6bfffffff0117c950e08b5581 │ Nanumea    │ geoBoundaries │
│ 085ee225bfffffff01d6f104552c846d │ Nukufetau  │ OpenStreetMap │
│ 085ee26abfffffff011d61409e7a2a60 │ Nukufetau  │ geoBoundaries │
│ 085ec268bfffffff01127c0a83d5a010 │ Vaitupu    │ OpenStreetMap │
│ 085ec268ffffffff01a6a0a36fb14d45 │ Vaitupu    │ geoBoundaries │
│ 08561ac2ffffffff01b3e5fdad89ef1e │ Niutao     │ OpenStreetMap │
│ 08561ac2ffffffff01524164d56dc361 │ Niutao     │ geoBoundaries │
├──────────────────────────────────┴────────────┴───────────────┤
│ 16 rows                                             3 columns │
└───────────────────────────────────────────────────────────────┘