When deleting the riak_kv_delete process will create a tombstone and push the tombstone across the preflist. Then, it will fetch the tombstone using riak_kv_get_fsm, as if all vnodes return a tombstone, the riak_kv_get_fsm is the process which prompts for those tombstones to be reaped.
The reaping uses the riak_kv_vnode:del/3 function, which will confirm that the locally stored object is a tombstone, and then if the delete_mode requires either an immediate or delayed reap, the reap will be prompted as appropriate.
The riak_kv_get_fsm has a safety check. If all primaries are not currently up the reap is not prompted. This prevents a down or partitioned primary with an old object resurrecting that old object when the vnode reconnects.
When doing replication this will cause a natural delta to form between clusters when a node is down in one of the clusters. For all objects which are deleted, if the delete_mode is not keep, and if a failed node is within the perfect preflist, then the cluster with the failure will have a tombstone, whereas the cluster without the failure will have no tombstone.
This should be safe (i.e. a fresh write on the cluster where the tomb has been reaped should form a sibling and not be dominated by the tombstone). This should also be resolved through full-sync. However, resolution through full-sync may be time consuming, and more interesting sync issues may be masked (and have their resolution delayed) by this tombstone discrepancy.
When deleting the riak_kv_delete process will create a tombstone and push the tombstone across the preflist. Then, it will fetch the tombstone using
riak_kv_get_fsm
, as if all vnodes return a tombstone, theriak_kv_get_fsm
is the process which prompts for those tombstones to be reaped.The reaping uses the
riak_kv_vnode:del/3
function, which will confirm that the locally stored object is a tombstone, and then if the delete_mode requires either an immediate or delayed reap, the reap will be prompted as appropriate.The
riak_kv_get_fsm
has a safety check. If all primaries are not currently up the reap is not prompted. This prevents a down or partitioned primary with an old object resurrecting that old object when the vnode reconnects.When doing replication this will cause a natural delta to form between clusters when a node is down in one of the clusters. For all objects which are deleted, if the delete_mode is not
keep
, and if a failed node is within the perfect preflist, then the cluster with the failure will have a tombstone, whereas the cluster without the failure will have no tombstone.This should be safe (i.e. a fresh write on the cluster where the tomb has been reaped should form a sibling and not be dominated by the tombstone). This should also be resolved through full-sync. However, resolution through full-sync may be time consuming, and more interesting sync issues may be masked (and have their resolution delayed) by this tombstone discrepancy.