SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
517 stars 184 forks source link

Loading filtered data? #3470

Open jazlynntan opened 1 week ago

jazlynntan commented 1 week ago

Hi, I used kilosort4 outside of spikeinterface for preprocessing and spike sorting. I did some brief checks and curation in Phy. I now want to load the data into spikeinterface to do other visualizations and calculate additional quality metrics. However, when I loaded the recording using read_openephys, the sorting using read_kilosort and read_phy, the waveforms extracted looked wrong despite them looking fine in templates.npy.

The units visualized with the templates.npy

Screenshot 2024-10-09 at 5 01 18 PM

The units visualized using the waveformextractor on the sortinganalyzer created from the sorting from read_phy and recording from read_openephys.

Screenshot 2024-10-09 at 5 02 59 PM

In addition, when I ran compute_quality_metrics(), the median amplitudes and snr were odd - all units had the same median amplitude and the noise level estimated for snr calculation were mostly negative.

I suspect that it has to do with the recording in the sortinganalyzer being the unprocessed raw data. Is this the case? If so, how can I load the preprocessed recording from kilosort so that I can perform visualizations and quality metric computation in spikeinterface?

Thank you.

zm711 commented 1 week ago

Hey @jazlynntan, you would regenerate the same filtering and referencing inside of spikeinterface. We can't really take the exact preprocessing of kilosort, but instead you could do something like:

import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
recording = se.read_openephys(xx)
recording_f= spre.bandpass_filter(recording)
recording_cmr = spre.common_reference(recording_f)

You could also do other preprcessing/look up what kilosort is doing to replicate the same preprocessing in spikeinterface.

jazlynntan commented 1 week ago

Thanks for the prompt response.

Kilosort4 has an option to save preprocessed data to a temp_wh.dat file. Does spikeinterface have extractor modules that can read the dat file? edit: I tried the read_binary function as per below using the information from rec0903_seg3 which is the raw recording loaded using read_openephys. The processed_path refers to the temp_wh.dat file generated by kilosort4. May I know whether I am reading in the processed file appropriately?

processed_0903 = si.extractors.read_binary(processed_path, sampling_frequency=rec0903_seg3.sampling_frequency, num_channels=rec0903_seg3.get_num_channels(), dtype='int16', channel_ids=rec0903_seg3.channel_ids, gain_to_uV=rec0903_seg3.get_channel_gains(), offset_to_uV=rec0903_seg3.get_channel_offsets(), is_filtered=True)

Also, just so that I understand correctly, spikeinterface allows the use of multiple spike sorters (eg. kilosort, spykingcircus, etc.) but this is only for the sorting stage itself? If I wanted to use the preprocessing or other features of these spike sorters I would have to directly use the software outside of spikeinterface?

Thanks!

JoeZiminski commented 1 week ago

Hi @jazlynntan, unfortunately there is not an extractor to load the temp_wh.dat, it was attempted here, but it's not possible. Note that this file has some zero padding around the edges, so reading it directly as a binary will not work. While this is relatively easy to handle, a deeper problem is that this data is whitened, but the unwhitening matrix is not available for some KS versions. As whitening has a massive impact on the waveform shape, it is not possible to easily assess this preprocessed data. I think there are also some additional issues around data scaling, please see the PR for full details.

In general you are touching on an important point in the preprocessing chain, in particular with kilosort. To compute waveforms, postprocessing etc. in SpikeInterface you need access to the preprocessed recording. However, it is not possible to access the preprocessed kilosort recording (for the reasons above). Therefore there are two options. One is to do all preprocessing in spikeinterface (including drift correction and whitening) and pass kilosort the preprocessed data, turning off preprocessing in kilosort with the 'skip_kilosort_preprocessing' argument. However, it will be worth checking the outputs carefully as the kilosort_like motion correction is close-to but not identical to the kilosort matlab implementation, and there may be subtle differences in the whitening approach, but I am not sure as I have not looked at the code in detail. A middle-ground is to perform the basic preprocessing steps in spikeinterface (filtering, CAR) and turn these off in kilosort by setting the min frequency on the filter to very low and turning off CAR in kilosort. Then you can perform postprocessing on this data in spikeinterface. However, it is important to note the postprocessing will be computed on data that is not drift-corrected, while the kilosort results will be on the drift-corrected data. If you don't have much drift, it should make little difference. On that PR there is a comparison for a test dataset.

For (I think all the sorters) the sorters arguments are exposed, and I guess by default (depending on the sorter, for sure with kilosort) will run their own preprocessing steps. So, if you want to run preprocessing steps in spikeinterface it's good to use the sorter arguments to turn these off at the sorter level, so you don't duplicate the preprocessing. Let me know if anything is not clear!