This adds logic after an expire snapshots commit to delete stale manifest and data files. It also adds a delete function to the ExpireSnapshots API so that the caller can use alternative delete logic.
For each snapshot that is removed, any data files that were marked deleted in that snapshot will be deleted. Because manifests can be reused, a manifest is only deleted if it is not referenced by any of the snapshots that are not yet expired.
This could delete files that are deleted an a snapshot and added back in a later snapshot. This could be updated to check whether the file is referenced by the current snapshot, but this solution is expensive and probably not worth the check. Callers should not delete files and re-add them without changing the file name or expiring the snapshot where the file was removed.
@omalley, it would be great if you would review this. It implements manifest and file clean-up when snapshots are deleted.
This makes three choices I'd like your opinion on:
Data files are deleted without checking to see whether they are in the current snapshot. If they are deleted, they should not be in the current snapshot unless someone added them back. Is this a case we care enough about to make deletes more costly by adding a scan of the current snapshot as a fail-safe?
When a snapshot is expired, this assumes that anything deleted in that snapshot can be deleted. This introduces the potential to explicitly delete a snapshot but not its predecessors, then roll back the current to a prececessor that is missing data. What makes this slightly worse is that snapshots aren't required to form a strictly linear history, so you could delete, undo the delete, and get in this situation. Checking whether files are in the current snapshot would prevent corrupting the current snapshot and might be a good trade-off. I don't think we want to scan all snapshots.
Files are deleted when the snapshot in which they were logically deleted is expired, not when the manifest that contains the file's delete entry is removed. Manifests can be in multiple snapshots, so this cleans up data files that will not be read faster. But, the reused manifest files will still contain pointers to files that are gone. I think this is okay, but could use a sanity check.
This adds logic after an expire snapshots commit to delete stale manifest and data files. It also adds a delete function to the
ExpireSnapshots
API so that the caller can use alternative delete logic.For each snapshot that is removed, any data files that were marked deleted in that snapshot will be deleted. Because manifests can be reused, a manifest is only deleted if it is not referenced by any of the snapshots that are not yet expired.
This could delete files that are deleted an a snapshot and added back in a later snapshot. This could be updated to check whether the file is referenced by the current snapshot, but this solution is expensive and probably not worth the check. Callers should not delete files and re-add them without changing the file name or expiring the snapshot where the file was removed.