dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 28 forks source link

`dandi delete` doesn't get all assets to delete. #1426

Closed yarikoptic closed 7 months ago

yarikoptic commented 8 months ago

Needed to delete errorneously uploaded code/venvs. Got alarmed first when number of assets reported by delete was smaller than the ones I get from dandi ls:

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 22138 assets on server from Dandiset 000876? [y/N]: ^C
2024-03-30 15:19:43,325 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240330191859Z-261412.log

❯ dandi ls dandi://dandi/000876/code/ > /tmp/ls-out
2024-03-30 15:20:13,599 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240330191952Z-261644.log

❯ grep 'path: code/venvs' /tmp/ls-out | nl | tail
...
28262     path: code/venvs/reposit/share/man/man1/ttx.1

so we got 22138 offered to be removed out of 28262? Then I deleted a folder with 3 for a test:

❯ dandi delete dandi://dandi/000876/code/venvs/reposit/share/jupyter/nbextensions/jupyter-js-widgets/
Delete 3 assets on server from Dandiset 000876? [y/N]: y
PATH                                                                   STATUS     MESSAGE
.../reposit/share/jupyter/nbextensions/jupyter-js-widgets/extension.js Deleted           
...re/jupyter/nbextensions/jupyter-js-widgets/extension.js.LICENSE.txt Deleted           
...osit/share/jupyter/nbextensions/jupyter-js-widgets/extension.js.map Deleted           
Summary:                                                               3 Deleted         
2024-03-30 15:22:07,795 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240330192204Z-262295.log

and upon subsequent full delete request I got larger number of assets!

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 23025 assets on server from Dandiset 000876? [y/N]: 

There were changes in listing of assets on dandi-archive side, but I would not expect delete differ from ls where afaik they likely use the same end point and ls seems to reflect number of deleted files correctly going down:

❯ dandi ls dandi://dandi/000876/code/ > /tmp/ls-out2
2024-03-30 15:28:24,988 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240330192803Z-263470.log
❯ grep 'path: code/venvs' /tmp/ls-out | nl | tail -n 1
 28262    path: code/venvs/reposit/share/man/man1/ttx.1
❯ grep 'path: code/venvs' /tmp/ls-out2 | nl | tail -n 1
 28259    path: code/venvs/reposit/share/man/man1/ttx.1

Interesting the logs for delete seems to have some Caught exception (without detail) whenever ls didn't:

❯ grep 'Caught exception' /home/yoh/.local/state/dandi-cli/log/20240330*.log
/home/yoh/.local/state/dandi-cli/log/20240330191859Z-261412.log:2024-03-30T15:19:43-0400 [DEBUG   ] dandi 261412:139629642821696 Caught exception 
/home/yoh/.local/state/dandi-cli/log/20240330192136Z-262149.log:2024-03-30T15:21:38-0400 [DEBUG   ] dandi 262149:140064119992384 Caught exception Asset path 'code/__pycache__' points to a directory but lacks trailing /
/home/yoh/.local/state/dandi-cli/log/20240330192239Z-262435.log:2024-03-30T15:28:01-0400 [DEBUG   ] dandi 262435:139637496152128 Caught exception 

it seems that pagination is not sequential requests (or at least not necessarily arriving in the order sent), so I wonder if there is some race condition and some exception causes early exit from collation of paginated requests?

Attests to that differing number of assets it lists across invocations:

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 22560 assets on server from Dandiset 000876? [y/N]: n
2024-03-30 15:33:02,301 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240330193213Z-264477.log
dandi delete dandi://dandi/000876/code/venvs/  22.10s user 0.24s system 45% cpu 49.470 total
❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 22634 assets on server from Dandiset 000876? [y/N]: 

BTW that "Caught exception" is not always logged for delete -- another sign of some inconsistent execution seems to me.

I will not delete and exclude dataladification (does not work with .gitignore!) of this dandiset for now so we have working example of problematic case to ease troubleshooting...

jwodder commented 8 months ago

@yarikoptic My investigations found the following Archive bugs which are probably to blame:

jwodder commented 8 months ago

@yarikoptic The Archive bugs appear to be fixed now. Are you still having any problems deleting?

yarikoptic commented 8 months ago

it seems to look consistent and matches the number now:

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 28259 assets on server from Dandiset 000876? [y/N]: N
2024-04-01 12:31:50,822 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/20240401160956Z-640400.log

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 28259 assets on server from Dandiset 000876? [y/N]: 

I am running it now.

yarikoptic commented 8 months ago
Summary:                                                            28259 Deleted  

but I still see https://dandiarchive.org/dandiset/000876/draft/files?location=code%2Fvenvs&page=1 in the browser ... rerunning gives me

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 4145 assets on server from Dandiset 000876? [y/N]: 

so seems not working "fully". running this one again... but it seems that issue is not really resolved fully thus reopening

yarikoptic commented 8 months ago

and after that --

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 636 assets on server from Dandiset 000876? [y/N]: 

FTR the prior log was /home/yoh/.local/state/dandi-cli/log/20240401183902Z-667577.log

❯ grep 'Response: ' /home/yoh/.local/state/dandi-cli/log/20240401183902Z-667577.log | sed -e 's,.*:,,g' | sort | uniq -c
     44  200
   4145  204

so we got 4145 204 responses... and in previous

❯ grep 'Response: ' /home/yoh/.local/state/dandi-cli/log/20240401163152Z-648330.log | sed -e 's,.*:,,g' | sort | uniq -c
    285  200
  28259  204

may be not all just were removed on server side ?? I thought I would find duplicate DELETE requests between two runs but no:

❯ grep 'DELETE https://api.dandiarchive.org/api/dandisets/000876/versions/draft/' /home/yoh/.local/state/dandi-cli/log/20240401163152Z-648330.log /home/yoh/.local/state/dandi-cli/log/20240401183902Z-667577.log | sed -e 's,.*DELETE,,g' | sort | uniq -c | sort -n | nl | tail -n 3
 32402        1  https://api.dandiarchive.org/api/dandisets/000876/versions/draft/assets/fff6ae06-d8b1-4fb2-88c2-91231583fec5/
 32403        1  https://api.dandiarchive.org/api/dandisets/000876/versions/draft/assets/fffc0e83-b5ec-43f3-9851-c8ce83f18376/
 32404        1  https://api.dandiarchive.org/api/dandisets/000876/versions/draft/assets/ffff4674-212c-4e41-87ef-497e830fd0d0/

so it seems that we have unique asset ids between those two runs... odd. needs more troubleshooting -- I am not running another delete so you could try... let me know if you think those log files would come handy

jwodder commented 7 months ago

@yarikoptic If you run the dandi delete command now but don't confirm the deletion, how many assets does the command say it'll delete? I'm getting 7089, which appears to be the correct amount.

yarikoptic commented 7 months ago

yikes, it grew indeed:

❯ dandi delete dandi://dandi/000876/code/venvs/
Delete 7089 assets on server from Dandiset 000876? [y/N]: 

@TheChymera is your upload process still running by some chance?
@jwodder might be worth checking dandi-api logs under /mnt/backup/dandi/heroku-logs/dandi-api to possibly gain more insight. reminder: I will be traveling, please figure it out - I believe if @TheChymera adds you @jwodder as co-owner on the dandiset, you should be able to perform those actions too!

TheChymera commented 7 months ago

@yarikoptic yes, I started it again to see if I can reproduce the error, but if finished now, and yes, I get the same error.

jwodder commented 7 months ago

@yarikoptic I see logs of assets still being uploaded to 000876 on April 1 around 16:00 EDT, when you were running dandi delete, which seems to confirm that what happened was that @TheChymera was still uploading assets while you were deleting them.

yarikoptic commented 7 months ago

Uff... @TheChymera - can you then now remove those assets in one try?

yarikoptic commented 7 months ago

I meant your code/venvs, not entire dandiset

TheChymera commented 7 months ago

ok, deleting right now. Seems to be working thus far.

yarikoptic commented 7 months ago

and did it work out?

yarikoptic commented 7 months ago

ok, it seems that now lists 0 assets. let's assume resolved.

TheChymera commented 7 months ago

Yep, it worked.