drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
690 stars 90 forks source link

Timeline misses versions with same timestamp #686

Closed nrenner closed 1 year ago

nrenner commented 1 year ago

Versions 3 and 4 of node ‪3202699546‬ (history) share the same timestamp 2021-12-16T16:01:38Z, see also History Browser, API.

A timeline query for version 3 does not return a result:

[out:csv(refversion, created)];
timeline(node, 3202699546, 3);
out;

https://overpass-turbo.eu/s/1tcO

Actual                               Expected

refversion  created                  refversion  created             
                                     3           2021-12-16T16:01:38Z

Iterating over all versions also misses version 3:

[out:csv(refversion, created)];
timeline(node, 3202699546);
out;

https://overpass-turbo.eu/s/1tcR

Actual                               Expected

refversion  created                  refversion  created
1           2014-11-24T08:29:08Z     1           2014-11-24T08:29:08Z
2           2021-12-07T08:17:41Z     2           2021-12-07T08:17:41Z
                                     3           2021-12-16T16:01:38Z
4           2021-12-16T16:01:38Z     4           2021-12-16T16:01:38Z
drolbr commented 1 year ago

This is the intended behaviour. To make sense of the OSM data and its oddities, in particular the way coordinates in OpenStreetMap, one must define precise model assumptions.

Overpass API relies on the timestamps of the objects, in particular to resolve way coordinates. This avoids to have to deal with states that existed only after a minute diff on the replicated side because an object update appears in the minute updates later than it was performed on the primary database server (up to 20 minutes delay have been observed).

This also means that object versions with zero lifetime do not exist in the Overpass API model. In all investigated cases where two versions of the same object and same timestamp exist, they have the exact same data. So there is no information loss.

mmd-osm commented 1 year ago

In all investigated cases where two versions of the same object and same timestamp exist, they have the exact same data

I doubt this is correct in general. Let's take one concrete example I just found in the log files:

Version 15 has a later or equal timestamp (2023-04-02T18:25:50Z) than version 16 (2023-04-02T18:25:50Z) of Way 26434131
Version 16 has a later or equal timestamp (2023-04-02T18:25:50Z) than version 17 (2023-04-02T18:25:50Z) of Way 26434131

As we can see in the OSM Deep History for this way, tagging and metadata differs across object versions: https://osmlab.github.io/osm-deep-history/#/way/26434131

image

Similar messages can be observed for nodes (and to a much lesser extent also relations):

Version 12 has a later or equal timestamp (2023-04-02T15:42:07Z) than version 13 (2023-04-02T15:42:07Z) of Node 2125500002
Version 13 has a later or equal timestamp (2023-04-02T15:42:07Z) than version 14 (2023-04-02T15:42:07Z) of Node 2125500002

-> https://osmlab.github.io/osm-deep-history/#/node/2125500002

StreetComplete is pretty famous for triggering these kinds of messages. Now that the API is much faster than pre-2019 (thanks to CGImap) and StreetComplete updates the same object in different changesets at the same time, this happens much more frequently in Overpass API nowadays. I reported it 2 years ago already: https://github.com/streetcomplete/StreetComplete/issues/2318#issuecomment-868975126

NB: Since OSMCha relies on Overpass API, we're also seeing some related screw ups there, in particular for StreetComplete changesets.

nrenner commented 1 year ago

@drolbr Thanks for the answer and explanation, I already thought so.

I get that resolving historic node refs needs timestamps and that the attic database is built around timestamps.

Object history and changeset tools want to consider all versions for tagged objects, though. For example in this case, PeWu history viewer correctly lists all versions and changed tags (using main OSM API), while OSMCha claims both opening_hours and wheelchair tags were added both in changeset 115015637 and 115015638.

This is because Overpass augmented diffs generally only contain a delta and not intermediate versions. The idea now was to query those missing versions with timeline (see also osmcha-frontend#604), but that wouldn't work in this case.

So this just confirms my personal conclusion, that the Overpass API is not the right tool for those use cases.

mmd-osm commented 1 year ago

@nrenner : if you feel strongly about this, you could try to reach out to StreetComplete and discuss with them, if they could introduce some additional delay of 1s between uploads for the same object ids. I wanted to propose something similar in the past, but it didn't seem like a huge issue back then. YMMV.

nrenner commented 1 year ago

@mmd-osm StreetComplete does unusual things, but nothing wrong, so downstream tools just have to deal with it. OSMCha could request those edge cases from the OSM API, but that would be yet another workaround for a workaround.

mmd-osm commented 1 year ago

The easiest way would probably be to introduce some rate limiting for the API upload endpoint (+ single object create/update/delete). Nominatim already has a limit of 1 request/s, as defined in the Nominatim usage policy.

Since this would result in a massive slow down for people uploading one object at a time, I have some doubts anyone would be willing to implement such rate limiting. Still, I would assume most users wouldn't notice it at all.

mmd-osm commented 1 year ago

Since this would result in a massive slow down for people uploading one object at a time

Now that this getting worse (e.g. https://www.openstreetmap.org/api/0.6/way/1180426639/history), it's time to double check this assumption. I think it would be sufficient to check when an object version was last updated, and if that's less than a second (or some configurable parameter in milliseconds), the OSM API could simply wait a bit. Most likely, StreetComplete would be the only app that's slowed down a bit by such a change. I don't see an impact for the single object uploaders.