dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 26 forks source link

Upload is inconsistent with NWB Zarr files #1520

Open mavaylon1 opened 2 hours ago

mavaylon1 commented 2 hours ago

I am able to upload a small zarr file to DANDI (~56Mb). However, I am trying to upload a file (~34Gb) and at first I am able to reach 92% upload after a span of 36 hrs. I got the error shown. I assume it was a time out issue so I ran it again. I was told that if I ran the same command I did prior (dandi upload --existing-refresh) that it should continue from the last checkpoint.

I got the same error the second time, but now at 32%

Questions:

  1. Is that 32% of what is left over or did it start up again?
  2. Why is this happening and would decreasing the number of files (increasing the chunk size) in the zarr store help (currently at 1.5 million)
Screenshot 2024-11-05 at 6 05 48 PM
mavaylon1 commented 2 hours ago

@yarikoptic

satra commented 2 hours ago

@mavaylon1 - how many objects in the zarr file and what's the rough size of the objects? we generally recommend that you chunk the zarr such that each chunk is in the MB range rather than KB.

as an fyi, instead of using the zarr python library, if you use the tensorstore library from google you should also be able to write sharded v3 zarrs.

kabilar commented 1 hour ago

Hi @mavaylon1, can you also please provide the two log files? Thanks.