Import temp_wh.dat as recording object

rat-h commented 7 months ago

I need to read preprocessed recordings from kilosort 2/2.5/3, i.e., test_wh.dat to compute some specific metric (mimicking ML scripts in-house developed a while ago). Is it possible to create a spikeinterface recording object from test_wh.dat?

I found a few issues such that #1916 and #1896, where @alejoe91 and @JoeZiminski discussed this. They mentioned' KilosortTempWhRecording' a few times, but I couldn't find it in the main and dev branches in the repository.

Do we have a simple solution for that?

Note that I don't need phy export. - I need to access the data and read snippets from different channels for each spike. Because the original MatLab script uses preprocessed test_wh.dat to reproduce results in Python, I need to use this file, and SI is the simplest way to manipulate the data.

zm711 commented 7 months ago

Howdy @rat-h.

It was begun in #1954. But I think we were trying to decide if this was worth pursuing since the whitened data can mess with waveform structure. It is just a binary file so you could use the binary extractor right now (you would need to give sampling rate, number of channels, etc). See here for full requirements to use the binary extractor.

zm711 commented 7 months ago

And as @alejoe91 said in #1916 you would need to know if any channels were dropped by kilosort, which would add a complication to using the si.read_binary.

JoeZiminski commented 7 months ago

Hey @rat-h yes @zm711 is right, unfortunately it is not easy to do this neatly in a general-purpose way that supports all Kilosort versions as the versions handle the whitening and dropped channels differently. However this may be easier to do for a known dataset and KS version.

You'll see from #1954 which is a very rough, untested first attempt at implementation to explore what kind of issues come up. From the discussion and linked issues you will also see most of the trouble is around understanding how KS is writing the whitening matrix and scaling across versions. For versions after KS2 I don't believe the full whitening matrix saved, only a diagonal matrix used for scaling the output. However, my understanding is the un-whitened data is available in Phy for these KS versions so it must be possible to un-whiten the data somehow.

In general this is something I'd really like to see possible as it would be great for the community to be able to easily load and quality-check the pre-processed data that KS is using as an input for sorting, with and without whitening. Unfortunately this has stalled in the face of other priorities and the concern that even if possible, it may be very brittle across KS versions / patches. Happy to help as best possible if you pursue this.

zm711 commented 7 months ago

@JoeZiminski, Kilosort saves an unwhitening matrix (whitening_mat_inv.npy). Phy loads this to unwhiten, but e.g. our export_to_phy doesn't write this since the raw data out of SI is not necessarily whitened. But your point about inconsistency between versions is spot on. I think that will make this very hard without out having one per version. (without an unwhitening matrix phy just uses np.eye I think to make a "mock" unwhitening).

rat-h commented 7 months ago

Thank you all for the very detailed explanations. I see what problems can arise and that there are no easy ways to work around them.

SpikeInterface / spikeinterface

Import temp_wh.dat as recording object #2650