Open kabilar opened 2 months ago
Hi Alessandra, we don't currently have a way for end users to easily check/clear incomplete uploads on a dandiset (although we are trying to improve this). I've checked and can confirm I see the following in progress / incomplete uploads:
001172
: 8 incomplete uploads, between ~35gb and ~50gb in size.001170
: 20 incomplete uploads, between ~20gb and ~60gb in size.If you'd like, I could clear these uploads, in order for you to proceed. If there is some missing data, you could either hold off on un-embargo until you've finished uploading the data, or un-embargo right away, and finish uploading afterwards. There would be no functional difference.
Please let me know what you'd like to do.
Yes, please proceed with clearing those uploads. I will discuss the next steps with the authors. Do you have any suggestions on the uploading procedure to avoid incomplete uploads?
Yes, please proceed with clearing those uploads. I will discuss the next steps with the authors.
Okay, this has been done. I can confirm that both of those dandisets are now available for un-embargo.
Do you have any suggestions on the uploading procedure to avoid incomplete uploads?
At the moment, not particularly. My guess is that its due to the process running the CLI being killed during upload, which leaves those uploads orphaned. So my only advice would be to gracefully shutdown the CLI if an upload is in progress (if that's the case here). @yarikoptic any advice on this front?
What do you mean by gracefully shutting down the CLI?
What do you mean by gracefully shutting down the CLI?
This is where my lack of knowledge about the CLI comes into play, but I'm assuming that when a user cancels a running upload, the CLI attempts to handle that in a way that doesn't leave orphaned uploads. So if that process was hard killed, it wouldn't have a chance to do so. However, afaik we don't have endpoints to delete uploads in any way, so the only way the CLI might do that is finish any uploads in progress.
yes, though users will tend to run into this problem if they have slow or unreliable upload speeds and may not always be able to let an upload complete.
@jwodder, @yarikoptic do you think a reasonable solution to this might be to add a CLI function that clears any existing unfinished uploads?
That function would still need API support, which I think is something we should provide anyway for the web client.
Sounds good to me
... the only way the CLI might do that is finish any uploads in progress.
neat idea, filed
But overall it would not be a solution: we do need to improve API for the life cycle of uploads - both for regular assets or zarrs. Related existing issues for zarrs
if here an issue is about regular assets (I assume, didn't tripple check) -- IMHO we should
dandisets_uploads
endpoint (/dandiset/{dandiset_pk}/uploads/
) to list existing uploads for a dandiset
uploads_cancel
endpoint (/uploads/{upload_id}/cancel/
) to cancel an existing (blob) upload.With that any UI/client can implement querying/canceling of uploads, and then "Unembargo" button could check/query for confirmation/clean things up to proceed to /unembargo/
functionality. A complimentary alternative is to make /unembargo/
to add a bool flag cleanup_uploads
which would, if True, first automatically clean up all such incomplete uploads, while initial request error would provide information about such uploads. But I like that approach less.
This problem will largely be addressed by the upcoming garbage collection, but I agree we should add that functionality to the API. Namely, listing and cancelling of existing uploads.
As reported by a user on August 27 via email to
help@dandiarchive.org
, when umembargoing a Dandiset an error occurs which turns out to be a result of incomplete uploads. The error message displayed to the user upon pressingUNEMBARGO
is:@jjnesbitt found that the API was returning a 400 error when attempting to unembargo. To resolve, Jake manually removed the uploads after confirmation from the user that the uploads were not necessary. We should catch this case of incomplete/interrupted uploads, provide this feedback directly to the user, and provide instructions on how to resolve.
Thank you.