h-mayorquin commented 1 week ago

OK, I have been very busy but been meaning to this since #3053 and #2958

We should add an IO tutorial were we explain was is the way that we intended for spikeinterface objects to be saved.

This is a discussion post to discuss the details.

My opinion:

I think it should include a description of the two main functions to save. save_to_binary and save_to_zarr and what kind of arguments it supports.
A link to the write to nwb how to (on the slow makings by me!).
Their relationship to provenance, efficiency and a description of the formats. For the folder the structure should include the structure of the folder and for zarr the same equivalent tree (@alejoe91 )

Probably some of this information is distributed in the modules documentation. I will need to fish was already there and just add structure to it.

alejoe91 commented 1 week ago

Also the structure of analyzers folders/part paths!

JoeZiminski commented 1 week ago

This sounds great!! I am not so familiar with save_to_binary and save_to_zarr, where does the recording.save() fit it? Are there any other saving functions?

h-mayorquin commented 1 week ago

save() is a convenience function router that ends up in one or the other through a rather complicated path that I aim to document as some point : )

zm711 commented 1 week ago

I do think but I forget where this was stated (I think it was @JoeZiminski ), our docstring formatting injection really fails for save. Sometimes I try to remember what arguments I need for saving a sorting vs saving a recording and the docstring isn't perfect. So i really support an IO tutorial so that we at least lay it out! Thanks for writing this up @h-mayorquin !

JoeZiminski commented 4 days ago

Great so just to review, ATM there is:

1) si.write_binary_recording (writes recording to a single .raw file with no spike-interface metadata). 2) si.write_to_h5_dataset_format similar to write_binary_recording but so an h5 file 3) recording.save_to_memory() I'm not so sure what this does but it looks very cool 4) recording.save_to_binary() Saves to folder with data stored in binary + some spikeinterface metadata 5) recording.save_to_zarr(). Same as above but with zarr 6) the recording.save() frontend. Convenience function around the recording methods.

It's awesome that so many file writing methods are supported. I wonder if these is some room for API optimisation, although it is certainly not simple. It is complicated by the fact that 1) There are different (all useful) ways to save the data, as a standalone file (binary, h5) or in "spikienterface-format", and that these functions all require different kwarg sets. Initially I thought it would be nice to route everything through recording.save() and make everything else private, but the differing kwarg sets make this impossible.

Some ways to streamline might be: make a distinction between spikeinterface-style saving (e.g. save_as_spikeinterface_format(format="binary") (with a better name) to distinguish it from the standalone binary write_binary_recording as easy to get confused between these. It might also be worth moving write_binary_recording and write_to_h5_daaset_format to the recording object so everything is in one place? and somehow incorporating these into the save() function? (these could be the front-end interface for these functions discussed in #2958).

I'm not 100% sure on the above, the number 1 thing to help make all this clear will be this IO tutorial, it will be super useful!

h-mayorquin commented 3 days ago

Related to here:

SpikeInterface / spikeinterface

Add IO tutorial #3098

3111