dandi / dandisets

737 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

Deleted Data #388

Closed salimmj closed 1 month ago

salimmj commented 1 month ago

I made a mistake recently where I uploaded a dataset that was supposed to be embargoed but I did it wrong so it was publicly available for some time. Fortunately I was able to delete it later from DANDI archive but I can still see a record of it here and in dandisets org.

I tried downloading the data using Datalad (hoping it doesn't work) and it looks like the raw data did not download successfully but the metadata was still there. Is it possible to remove the dataset (including metadata) from git record?

Also, is this something worth doing programmatically? I noticed other issues running into errors when trying to download deleted datasets that are still referenced here.

Is the raw data ever mirrored or is it just the metadata that's replicated?

Thank you!

kabilar commented 1 month ago

Thanks for the report, @salimmj. Which Dandiset number was this dataset?

Hi @yarikoptic @jwodder, could you please help @salimmj with this issue and file an issue if this is a bug with the cleanup of datasets after they have been deleted? Thank you.

yarikoptic commented 1 month ago

ping @salimmj on an exact dandiset index. Am I right to assume it is the 001043 (found by grepping dandiset.yaml)?

Meanwhile

@salimmj if you think there is more to be done for this particular dandiset - feel welcome to follow up/reopen this issue.

Note that in principle as soon as a dataset publicly shared, there could be copies of it downloaded and redistributed since shared publicly and under open license terms which does allow (re)distribution and retention even if original authors changed their minds.

salimmj commented 1 month ago

Thank you that should do it!