Open walterddr opened 2 years ago
Good point. The retention manager doesn't really check the deep store file, but scan the ZK metadata, so I think we should first delete from deep store, then remove the ZK metadata as the last step so that the retention manager can always find the undeleted segments
Exactly what I thought as well @Jackie-Jiang ^ implementing a fix
some caveat for our current implementation
Ideally speaking what we needed here decouple the deep store deletion from the table/segment deletion. -> for table/segment deletion, as long as their ideal state is being removed from ZK. we consider them to be deleted. -> for deep store data, utilizing RetentionManager to do the clean up.
however, this causes problems if table / segments with the exact same name is re-created (for example if one were to replace a corrupted segment with a newly ingested one). So challenge here is resolve operations similar to this one that are ultimately requires a "sync" deletion across zk and deep store.
solution 1: have some sort of tombstone mechanism to indicate that the table/segment is marked as deleted, but do not recreate the same identifier again until the clean up has been completely done. solution 2: make versioning on table/segment that always increment version number when a new table / segment is created. solution 3: ???
Currently SegmentDeletionManager has
deleteSegments
API that allows users to delete segments from property store and from deep store.removeSegmentsFromStore
API that allows users to only delete from deep store.This creates confusion regarding failure recovery, when the component service dies in the middle of the execution, they can either result in
deleteSegments
, property store could've deleted segment A, but file still exist in deep storeremoveSegmentsFromStore
, segment A's file could've been deleted from deep store, but property store could still have segment A.Which one should we go with as failure recovery strategy? From the point that we have a retention manager that periodically checks deep store for files to delete, I think we should always first delete from property store and next delete from deep store. thoughts?