Open yarikoptic opened 8 months ago
The client already batches Zarr entry deletions, 100 entries per request. I doubt doing multiple batches in parallel is going to result in faster turnaround from the server.
why? doesn't it handle requests in parallel?
@yarikoptic I can't find why the exact value of 100 was chosen, but I believe the point of the limit is to avoid making the server do too much work on a Zarr at once. Simultaneous requests would therefore mean too much work for the server.
@jjnesbitt @mvandenburgh et alii: Can you confirm or deny that there's no efficiency gain to be had from parallelizing batched Zarr entry deletion requests?
@jjnesbitt @mvandenburgh et alii: Can you confirm or deny that there's no efficiency gain to be had from parallelizing batched Zarr entry deletion requests?
That's correct, there would be no efficiency gain. This is for two reasons:
Is the current performance of the zarr deletion endpoint causing problems elsewhere?
Is the current performance of the zarr deletion endpoint causing problems elsewhere?
somewhat. From the log of the issue referenced in the OP: https://github.com/dandi/dandi-cli/issues/1410
❯ zgrep -e 'Deleting.*files' -e 'DELETE.*zarr' 20240223192140Z-306046.log.gz | head -n 2
2024-02-23T14:36:59-0500 [DEBUG ] dandi 306046:140253474395840 sub-randomzarrlike/sub-randomzarrlike_junk.zarr: Deleting 226053 files in remote Zarr not present locally
2024-02-23T14:36:59-0500 [DEBUG ] dandi 306046:140253474395840 DELETE https://api.dandiarchive.org/api/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/files/
❯ zgrep 'DELETE.*zarr.*files/$' 20240223192140Z-306046.log.gz | tail -n 1
2024-02-23T18:14:40-0500 [DEBUG ] dandi 306046:140253474395840 DELETE https://api.dandiarchive.org/api/zarr/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/files/
so I believe it took over 3 hours to "merely" to delete (lots of) files in ZARR having finished upload of other files. Here such a drastic action was needed since I changed "chunking" strategy for a zarr, so would not be completely uncommon. So I thought it might be nice to get it speedier.
related
1410
describes the use-case. I think that removal is going very slow and primarily since we do it serially on groups of keys. Couldn't we parallelize (using the same jobs) and issue bunch of requests (with retries if needed) to that API DELETE endpoint?