Open skmoore opened 3 months ago
Thanks for the examples @skmoore.
Today, multiple place tags from OpenStreetMap - the local_type
values you're seeing - are used to generate locality entities in the divisions theme. So for the Bratislava example, because those places are represented though multiple features in OSM with suburb
and city
place tags, multiple entities are generated in Overture. This is not ideal and is causing the duplicate and overlap issues you're seeing, so some changes are being planned to the ingestion pipeline as a fix.
The Kingston examples are actually legitimate entities - one in Jamaica, one in Tasmania, and one in Norfolk Island, hence they have unique capital_of_divisions
values.
The issue with the Levkosia example is slightly different, as multiple entities were generated even though they all share the same place tags / local_type
value. Running this query in duckdb
SELECT
id,
sources[1].dataset as dataset,
sources[1].record_id as concordance_id
FROM
read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=divisions/type=*/*', filename=true, hive_partitioning=1)
WHERE
id in ('085b2cd0ffffffff01ec1a70e2268d2a','085b2cd0ffffffff01aa2be6f778df73','085b2cd0ffffffff01861b0866ec3f98','085b2cd0ffffffff012a65eb36fb45c1');
you'll see four unique OSM features
┌──────────────────────────────────┬───────────────┬────────────────┐
│ id │ dataset │ concordance_id │
│ varchar │ varchar │ varchar │
├──────────────────────────────────┼───────────────┼────────────────┤
│ 085b2cd0ffffffff01ec1a70e2268d2a │ OpenStreetMap │ R16283715 │
│ 085b2cd0ffffffff01861b0866ec3f98 │ OpenStreetMap │ R2628520 │
│ 085b2cd0ffffffff01aa2be6f778df73 │ OpenStreetMap │ N1893015330 │
│ 085b2cd0ffffffff012a65eb36fb45c1 │ OpenStreetMap │ R2628521 │
└──────────────────────────────────┴───────────────┴────────────────┘
Ideally, a single entity would be maintained on Overture's end for this locality.
Both of these issues are related and similar to a discussion around localities here. There is no timeline for a fix yet, but once some action is taken, we can share a progress update. We're hoping to make some pipeline updates soon, so this should be corrected in one of the upcoming releases.
Feel free to add additional examples, they're very helpful.
@stepps00 Thanks for the info
I'm seeing duplicate
division
features in the July release. There are a few patterns, some of which may be expected.The value for
local_type
is different, so perhaps this is expected? In this example thecapital_of_divisions
column is identical for both features, but the values are too long to include here\ Another example where
local_type
andcapital_of_divisions
have different values for each duplicate\ Others are basically exact matches of each other