dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 10 forks source link

Garbage collection #177

Open yarikoptic opened 3 years ago

yarikoptic commented 3 years ago

I am adding to migration target since we better iron it out before switching to dandi-api: bugs in GC can lead to data loss thus IMHO we first need make sure (as with extensive unit-testing and user-testing) that it works reliably before deploying it.

Initial design sketch on garbage collection is present within https://github.com/dandi/dandi-api/pull/150/files#diff-c96d4444d1714a52d5d08dd92d94919393a7db8ded038aa84f02ba1075d2c25eR37 but I think it is worth removing it from that PR and starting a new dedicated one.

I see following targets for GC

Additional aspects:

waxlamp commented 3 years ago

Can you explain why this is a migration blocker?

satra commented 3 years ago

just a check here, i thought there is no more uploads - it goes straight into blobs.

yarikoptic commented 3 years ago

Can you explain why this is a migration blocker?

because we do need GC sooner than later, in particular since large datasets are to come soon. It could be "later", but placing it into production where we cannot afford loosing data would be trickier IMHO than implementing it before we "migrate" and while testing the platform, and not carrying much if we loose any data since we would probably still rebootstrap a few times.

yarikoptic commented 3 years ago

just a check here, i thought there is no more uploads - it goes straight into blobs.

yes. But what happens if an upload is never completed or validated? aren't we ending up with 1. stale uploads: records in DB; 2. incomplete keys in the keystore?

waxlamp commented 2 years ago

Design doc is at #560.

dandibot commented 2 years ago

:rocket: Issue was released in v0.2.18 :rocket:

edit by @yarikoptic: PR with design doc incorrectly marked this issue being fixed, it was not

yarikoptic commented 1 year ago

Now we do have a well aged (10 month) design doc in https://github.com/dandi/dandi-archive/blob/master/doc/design/garbage-collection-1.md . It would be great to re-assess it and implement. In particular in the light of https://github.com/dandi/dandi-archive/issues/1450 which might soon produce thousands of loose assets which would get replaced with ones with freshier metadata records.

waxlamp commented 1 year ago

This probably depends on #524.