digidem / osm-p2p-db

Peer-to-peer database for OpenStreetMap data
BSD 2-Clause "Simplified" License
235 stars 25 forks source link

Store version as well as id on way.nodes and relation.members #49

Open gmaclennan opened 7 years ago

gmaclennan commented 7 years ago

Moving from https://github.com/digidem/osm-p2p-db/issues/29#issuecomment-277378739 since this should be its own issue:

The issue of ways and relations referring to nodes/members only by OsmID is a big problem, not just for osm-p2p but for anybody dealing with historic OSM data. I know the Mapbox data team has hit this in their need to review data changes. The workaround they use I think is to use the timestamps to reconstruct which version of nodes/members were referenced by a particular way/relation. This is obviously fragile and costly, especially in a p2p system.

I think we can add versions to the way/relations in a way that remains compatible with existing clients:

Internally we should store both id and version of nodes within a way and members within a relation. We should prefer version when doing a lookup in the index, but fallback to id. -- see also https://github.com/digidem/osm-p2p-db/issues/48

When a non-p2p-aware client submits a change, use the version of the nodes / members from the changeset if present in the changeset, if not set the version by selecting the most recent fork using the same algorithm that would have been used to present the most recent fork to the client. Some issues we would hit:

  1. iD editor does not include any unchanged nodes in a changeset if you change the tags on a way. If you move a node in a way, the way itself is not included in the changeset.
  2. The version number of a way would need to change every time a node was changed. iD might need a patch to ensure it pulls down the updated way after only a node was updated.

For relations, this would make it hard to avoid forks: if we store version numbers on relations, any update to a member would need to update the version of the relation. This would mean we would not be able to use relations for long rivers to avoid forks - any edit to any segment of the river would create a new version of the relation.