SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
457 stars 175 forks source link

Add IO tutorial #3098

Open h-mayorquin opened 1 week ago

h-mayorquin commented 1 week ago

OK, I have been very busy but been meaning to this since #3053 and #2958

We should add an IO tutorial were we explain was is the way that we intended for spikeinterface objects to be saved.

This is a discussion post to discuss the details.

My opinion:

Probably some of this information is distributed in the modules documentation. I will need to fish was already there and just add structure to it.

alejoe91 commented 1 week ago

Also the structure of analyzers folders/part paths!

JoeZiminski commented 1 week ago

This sounds great!! I am not so familiar with save_to_binary and save_to_zarr, where does the recording.save() fit it? Are there any other saving functions?

h-mayorquin commented 1 week ago

save() is a convenience function router that ends up in one or the other through a rather complicated path that I aim to document as some point : )

zm711 commented 1 week ago

I do think but I forget where this was stated (I think it was @JoeZiminski ), our docstring formatting injection really fails for save. Sometimes I try to remember what arguments I need for saving a sorting vs saving a recording and the docstring isn't perfect. So i really support an IO tutorial so that we at least lay it out! Thanks for writing this up @h-mayorquin !

JoeZiminski commented 4 days ago

Great so just to review, ATM there is:

1) si.write_binary_recording (writes recording to a single .raw file with no spike-interface metadata). 2) si.write_to_h5_dataset_format similar to write_binary_recording but so an h5 file 3) recording.save_to_memory() I'm not so sure what this does but it looks very cool 4) recording.save_to_binary() Saves to folder with data stored in binary + some spikeinterface metadata 5) recording.save_to_zarr(). Same as above but with zarr 6) the recording.save() frontend. Convenience function around the recording methods.

It's awesome that so many file writing methods are supported. I wonder if these is some room for API optimisation, although it is certainly not simple. It is complicated by the fact that 1) There are different (all useful) ways to save the data, as a standalone file (binary, h5) or in "spikienterface-format", and that these functions all require different kwarg sets. Initially I thought it would be nice to route everything through recording.save() and make everything else private, but the differing kwarg sets make this impossible.

Some ways to streamline might be: make a distinction between spikeinterface-style saving (e.g. save_as_spikeinterface_format(format="binary") (with a better name) to distinguish it from the standalone binary write_binary_recording as easy to get confused between these. It might also be worth moving write_binary_recording and write_to_h5_daaset_format to the recording object so everything is in one place? and somehow incorporating these into the save() function? (these could be the front-end interface for these functions discussed in #2958).

I'm not 100% sure on the above, the number 1 thing to help make all this clear will be this IO tutorial, it will be super useful!

h-mayorquin commented 3 days ago

Related to here:

3111