drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
690 stars 90 forks source link

Overpass databases include redacted data #729

Open SomeoneElseOSM opened 1 month ago

SomeoneElseOSM commented 1 month ago

Prompted by https://community.openstreetmap.org/t/cyber-attacks-in-the-osm-space/113618/49 , I decided to check whether overpass searches at e.g. overpass-turbo.eu return data that they should not.

https://www.openstreetmap.org/way/56688705/history is a simple example of a redaction: v1 was 14 years ago v2 16 May 2024, 22:35 v3 16 May 2024, 22:49

An overpass search based on time https://overpass-turbo.eu/s/1M2E unfortunately returns the "hidden" data.

A maintainer of an overpass database would want to not include things like that. I'd suggest that they should discuss that with the people who look after the feeds that this software consumes - I'm unaware if there is one for "recently redacted objects" or how hard that would be to implement.

SomeoneElseOSM commented 1 month ago

See also https://github.com/drolbr/Overpass-API/issues/652#issuecomment-1030784121

drolbr commented 1 month ago

Thank you for reporting this.

The problem is that the redactions do not get replicated, and I am not aware of any other machine readable channel to get them transmitted.

This is likely affecting most or all services that keep a database of old data in sync.

SomeoneElseOSM commented 1 month ago

Agreed - you may need to work with the team responsible for the feeds that you're using so that you can obtain the data that you need.

mmd-osm commented 1 month ago

Overpass API consumes minutely diffs from https://planet.openstreetmap.org/replication/minute/

OWG doesn't publish any redacted object versions lists. For doing so, an export tool would have to be implemented first and an appropriate data format defined.

osmdbt might be the best bet for this task, since it already has parts of the logic implemented.

Deleted user ids is a related issue: here only full dumps are getting published, rather than delta files. Overpass API might have to process these dumps as well at one point.