digidem / osm-p2p-db

Peer-to-peer database for OpenStreetMap data
BSD 2-Clause "Simplified" License
235 stars 25 forks source link

Surface document deletions through the API. #44

Closed hackergrrl closed 7 years ago

hackergrrl commented 7 years ago

What?

To expose deleted OSM documents through the API in a way that corresponds to the underlying structure of the data model as closely as possible.

Why?

osm-p2p-db makes no assumption about providing a single, "true" answer. The underlying data model is messy: forks and multiple versions of documents are both possible and valid states.

The OSM API itself assumes a linear history, which brings complication. A modular approach, with clear division of responsibilities makes sense:

This PR is to make this possible: expose forks and deletions in an explicit way in order to let downstream modules get a more honest view of the forking data they're working with.

Changes?

Look at the diff for README.md; deleted documents are presented alongside non-deleted documents, via hyperkv.

These changes are breaking, and will require a major version bump.

What's Next?

Start to break apart non-server logic in osm-p2p-server that does data linearization into a new module, osm-p2p-api.

From there, ripple these API changes out to osm-p2p-api, which can be solely responsible for making sense of non-linear data and explicit deletions, and express them to further downstream modules, like osm-p2p-server.

hackergrrl commented 7 years ago

Re unhandled errors: I added error reporting for the "easy" cases. For some of the others -- like in the indexer and during node collection for streaming kdb queries -- I still want to think a bit more deeply about the failure modes.

For the indexer, we can either skip on failure or fatally terminate the program. Neither is very appealing: potentially incorrect data vs program won't run.

For the node collector, hard failure means the kdb queries are impossible if some data is bad, whereas skipping bad data might actually be acceptable.

hackergrrl commented 7 years ago

I think I've handled everything that we discussed on this PR. Another look-over would be appreciated: I'm eager to move onto the next big PR. ;)

hackergrrl commented 7 years ago

Revisions pushed.

Yes, let's do changeset_id on deletions in a separate (but very soon) PR.