As we have discussed, one of the main goals with this project is to be able to:
Efficiently load a remote DANDI NWB file
Modify that file (for example, adding new neurodata objects) using either pynwb or h5py.
Upload the modified file to the cloud, with the relatively small .zarr.json stored separately from the larger binary chunks
DANDI is not ready to accept this type of data (.zarr.json plus binary chunks), so I created this prototype project that shows how I envision this happening
The chunk consolidation step described in the readme to some extent obviates the need to use zarr sharding for reducing the number of files, although I think sharding will still be important for reducing the size of the .zarr.json file, since all chunks do need to be listed there.
As we have discussed, one of the main goals with this project is to be able to:
DANDI is not ready to accept this type of data (.zarr.json plus binary chunks), so I created this prototype project that shows how I envision this happening
https://github.com/magland/lindi-cloud
The README explains how it works and the advantages of the approach, but I'll repeat some of that info here.
Here's an example that adds an autocorrelograms dataset to the Units table, uploads that to the LINDI cloud, and allows you to view in neurosift https://github.com/magland/lindi-cloud/blob/main/examples/example_add_autocorrelograms.py
Here's the neurosift view https://neurosift.app/?p=/nwb&dandisetId=000939&dandisetVersion=0.240327.2229&url=https://lindi.neurosift.org/zones/magland/testing/f/dandi/dandisets/000939/assets/56d875d6-a705-48d3-944c-53394a389c85/aug_autocorrelograms.nwb.zarr.json&st=lindi
Click on Units and then "autocorrelograms"
The chunk consolidation step described in the readme to some extent obviates the need to use zarr sharding for reducing the number of files, although I think sharding will still be important for reducing the size of the .zarr.json file, since all chunks do need to be listed there.
@rly @oruebel @bendichter @yarikoptic