NeurodataWithoutBorders / lindi

Linked Data Interface (LINDI) - cloud-friendly access to NWB data
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Example for changing metadata (e.g. subject id) in NWB file #33

Closed yarikoptic closed 7 months ago

yarikoptic commented 8 months ago

Would be great to see and also to show that only the LINDI file is changed . Will diff be minimal, i.e. pointing only to that changed metadata? (just a question of interest)

magland commented 8 months ago

@yarikoptic I just made this example of modifying the subject age and producing a new lindi (.zarr.json) file. It seems that this kind of thing cannot be done with pynwb - you need to use h5py.

NOTE: This only works on the "write" branch of lindi.

https://github.com/NeurodataWithoutBorders/lindi/blob/write/examples/example_edit_nwb.py

rly commented 8 months ago

A JSON metadata + binary format (or view) for NWB data would very much facilitate easier editing and version control of non-big-array NWB data, which is what most people want to edit. We are in many conversations around this right now with @bendichter and @oruebel, but one possible goal is that if the specially formatted JSON is present, it is the ground truth representation / primary view of the NWB dataset. For existing HDF5 NWB data, we could generate the JSON, so existing data is compatible with this new layer. We would have to update the APIs accordingly for read/write/edit.

For that reason and because the LINDI-enhanced ReferenceFileSystem JSON follows an existing spec used by fsspec and therefore also Zarr, I would prefer to use some form of this JSON for editing files instead of the custom JSON sidecar approach that we experimented with: https://github.com/hdmf-dev/hdmf/pull/677 . This JSON represents the current state of the data rather than the sum of all edits made to the data.

More thoughts on this soon!

yarikoptic commented 8 months ago

@magland Thank you for the example! In that example you modify metadata directly at LINDI level. So to a degree it is resembling ability to modify HDF5 directly without involving pynwb. I was initially wondering if such change is possible at NWB (pynwb) level. But is it in general possible? most likely not and entire new .nwb should be saved/produced, right?

magland commented 8 months ago

@yarikoptic It seems at this time this type of modification cannot be done using pynwb... h5py is needed.

https://github.com/NeurodataWithoutBorders/pynwb/issues/1874

https://github.com/NeurodataWithoutBorders/pynwb/issues/1773

https://gist.github.com/rly/be7bee420b482b9ddcd084f57cc4115e

oruebel commented 8 months ago

It seems at this time this type of modification cannot be done using pynwb... h5py is needed.

PyNWB currently allows editing of datasets, adding data, and editing of attributes of datasets. However, editing of Groups (e.g,. changing names) and editing of attributes of Groups is not yet fully supported. Here the corresponding tutorial:

https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/plot_editing.html