NeurodataWithoutBorders / lindi

Linked Data Interface (LINDI) - cloud-friendly access to NWB data
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Working example: augment a DANDI NWB file and upload to cloud #34

Closed magland closed 3 months ago

magland commented 3 months ago

As we have discussed, one of the main goals with this project is to be able to:

  1. Efficiently load a remote DANDI NWB file
  2. Modify that file (for example, adding new neurodata objects) using either pynwb or h5py.
  3. Upload the modified file to the cloud, with the relatively small .zarr.json stored separately from the larger binary chunks

DANDI is not ready to accept this type of data (.zarr.json plus binary chunks), so I created this prototype project that shows how I envision this happening

https://github.com/magland/lindi-cloud

The README explains how it works and the advantages of the approach, but I'll repeat some of that info here.

Here's an example that adds an autocorrelograms dataset to the Units table, uploads that to the LINDI cloud, and allows you to view in neurosift https://github.com/magland/lindi-cloud/blob/main/examples/example_add_autocorrelograms.py

Here's the neurosift view https://neurosift.app/?p=/nwb&dandisetId=000939&dandisetVersion=0.240327.2229&url=https://lindi.neurosift.org/zones/magland/testing/f/dandi/dandisets/000939/assets/56d875d6-a705-48d3-944c-53394a389c85/aug_autocorrelograms.nwb.zarr.json&st=lindi

Click on Units and then "autocorrelograms"

The chunk consolidation step described in the readme to some extent obviates the need to use zarr sharding for reducing the number of files, although I think sharding will still be important for reducing the size of the .zarr.json file, since all chunks do need to be listed there.

@rly @oruebel @bendichter @yarikoptic