Open gmaclennan opened 7 years ago
One way (only way?) to solve this is by storing versions on document references e.g.
{
"type": "way",
"id": 1,
"nodes": [{
"id": 10,
"version": "abcde"
}, {
"id": 11,
"version": "fghijk"
}, {
"id": 12,
"version": "lmnopq"
}],
"tags": {
"highway": "tertiary",
"name": "Main Street"
}
}
We could only store versions if we like, since in osm-p2p-db versions are unique. This would mean that deletions could always happen without an "if-unused" check, since the deletion doc would be a new version, and therefore not used by any other document.
A side-effect of this given the case of the forked/deleted way about is that the non-deleted fork would be referencing nodes which are deleted at HEAD, how would they be surfaced in the kdb query and querystream()
?
Our current proposed way of surfacing deletions keeps deleted nodes in the kdb index for the locations of all the deleted node's parents. For deleted nodes returned from the kdb index we would need to find any ways that reference the id
, rather than reference the version
, and then follow that chain to get the nodes referenced by those ways based on the version
of that reference. This might require an additional lookup if we wanted to be strict about what appears in the bbox e.g.
in the case above, the node B
is not forked, and not modified by the deletion or modification of the way. If bbox query included where B
was before it was deleted, the kdb index should return the deleted record. We would then need to lookup in the join index to see which ways reference B
(the id B
not the version
of the deleted record). We would then get the two forks of the way A
- one deleted and one not - and we would need to lookup the nodes referred to by these ways by their version
(not their id
). There is an edge-case though where one of the forks of the way may contain nodes which are all outside of the bbox. We would need to do a bbox check on each of these lookups if we wanted to keep the API strict.
A side-effect of this given the case of the forked/deleted way about is that the non-deleted fork would be referencing nodes which are deleted at HEAD, how would they be surfaced in the kdb query and
querystream()
?
I wonder if the following is a good predicate for deciding if a node is deleted or not:
"A node is considered deleted iff there is a deletion tombstone for it that is a HEAD, and there are only deleted ways on HEAD that refer to it."
Thinking on how this would affect kdb indexing:
This would mean that downstream deforking would need to understand this definition of deleted, and apply it on nodes as well to decide which nodes are deleted or not, based on the way fork it chooses.
The OSM API does a check when you attempt to delete a node/way/relation to ensure that it is not referenced by any other documents, and if it is referenced it returns a 412 error: http://wiki.openstreetmap.org/wiki/API_v0.6#Delete:_DELETE_.2Fapi.2F0.6.2F.5Bnode.7Cway.7Crelation.5D.2F.23id
The changeset upload API has an option
if-unused
that silently ignores deletions of elements used by others: http://wiki.openstreetmap.org/wiki/API_v0.6#Diff_upload:_POST_.2Fapi.2F0.6.2Fchangeset.2F.23id.2FuploadIn a p2p distributed system a document could be unused on the current peer, but in-use on another peer.
E.g. Clients α and β have a way
A
with nodesB,C,D
. Client α deletes the way, and deletes the nodesB,C,D
since they are not used by any document in the db on α. Client β modifies a tag on the way, but does not modify any of the nodesB,C,D
. After replication way A is forked, but nodesB,C,D
have a single head which is a deletion. The undeleted fork of A now points to deleted nodes.