WIP : get additional data from HerdingspikesSortingExtractor

SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.

https://spikeinterface.readthedocs.io

MIT License

531 stars 188 forks source link

WIP : get additional data from HerdingspikesSortingExtractor #3525

Open b-grimaud opened 2 weeks ago

b-grimaud commented 2 weeks ago

It seems there already was some attempt to extract the additional info with load_unit_info a while ago.

For now, this PR can extract the location of a unit with get_unit_location, as you would with get_unit_spike_train. The unit locations are loaded by default, as I don't think the extra memory and computation costs should be very significant.

The rest of the data that HerdingSpikes provides is per spike :

Amplitude
Waveforms (possibly redundant ?)
Channel index
Unit index
x and y coordinates

This should be quite a bit more memory intensive to retrieve, and I'm not entirely sure of the use case for it. Nevertheless, I can see that it was considered as a possibility in the code that was already there.

Any feedback would be appreciated !

mhhennig commented 2 weeks ago

Thank you! I'll take a look asap.

samuelgarcia commented 2 weeks ago

thanks for this. I think we should have more discussion about the API. If the additional data is at unit level we just could add some properties in the sorting object this would make more sens. If the additional data is at spike level we would need an additional function at API level not do this extractor per extractor.

b-grimaud commented 2 weeks ago

If the additional data is at spike level we would need an additional function at API level not do this extractor per extractor.

This is what I was wondering about. The easiest solutions for now would be to return matching arrays per unit, but for the sake of reusability I guess the BaseSorting class would need to be able to handle arbitrary per-spike properties.

In the case of HerdingSpikes, a lot of useful data is already computed or extracted : waveforms, locations and amplitudes are already included, PCA is also computed but not included in the output. All of this is then recomputed by the sorting analyzer. Then it would be up to individual sorting extractors to match the expected data structure.

mhhennig commented 1 week ago

On the data HS extracts:

Amplitude: This is not the real amplitude, but a re-scaled version. I don't think these are useful.
Waveforms: These are peak channel waveforms, internally cached memmapped for quick PCA computation and can be written out into the hdf5 file the extractor reads.
Channel index: Useful as these are peak channels.
Unit index: Internal use.
x and y coordinates: COM based estimates, very useful as they are very quick to estimate.

I feel peak channel and x/y locations could be put into a SortingAnayzer object, that's where they would normally be found. Would this be possible?

h-mayorquin commented 1 week ago

I agree with Sam that the unit properties should be written as such. I would be curios about how to handle spike properties if you have them so tagging along here.