datalad / datalad-crawler

DataLad extension for tracking web resources as datasets
http://datalad.org
Other
5 stars 16 forks source link

s3: does not detect complete removal of a key #74

Open yarikoptic opened 4 years ago

yarikoptic commented 4 years ago

Somewhat relates to #73 -- in that case it is content to the older versions which was removed, here I am talking about completely removed keys, so there is no even DeleteMarker.

If bucket is versioned, we assume that we would see DeleteMarker whenever file is gone. But if key is removed entirely, with all prior versions, there is no DeleteMarker.

Because IIRC we traverse bucket anyways -- we would either need to start doing what we do for regular crawler (not s3) and keep the full list of downloaded items, or traverse the worktree and delete files no longer on S3 side. It should probably be optional, since should be enabled with clear idea that there should be no longer...