run_sorter_by_property aggregate export for Phy

YesterdaySam commented 2 months ago

Reposting this from my question on the neuropixels slack channel: I have been using SpikeInterface 0.101 to sort npx 2.0 data (KS4) with the run_sorter_by_property function. I get noise waveforms when visualizing the output in Phy, but all other parameters and metrics look ok (see below, only sorted first 30secs for speed).
This is because I'm opening the individual shank params files which each have a 96 channel subset saved but the overall .bin file of the recording to which params.py points has 384+sync channel = 385 leading to phy extracting bad waveforms. I can also get good results on the same dataset just running ks4 out of the box directly (second screenshot). image(1)

Is there a way to get SpikeInterface to save the aggregate 'sorting' object output the way ks4 does natively rather than as individual subfolders per shank? In theory I could just use run_sorter and ignore groupings, but I have heard this could lead to unexpected results. I also know the exporters module can do so but it takes a very long time to run even on a 30sec test data set so I am trying to avoid this for later batching.

Here is the script I am currently running with ############

import spikeinterface as si  # import core only
import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss

import numpy as np
from pathlib import Path

global_job_kwargs = dict(n_jobs=8, chunk_duration="1s")
si.set_global_job_kwargs(**global_job_kwargs)

base_folder = Path(r'D:\User\probe_data\KW007')
spikeglx_folder = base_folder / 'KW007_09032024_rec_D5_Lat1_g0'

# Load raw data, don't load sync channel here or subsequent operations won't work!
raw_rec = se.read_spikeglx(spikeglx_folder, stream_name='imec0.ap', load_sync_channel = False)

# High pass filter
rec1 = spre.highpass_filter(raw_rec, freq_min=400.)

# Align analog-digital converter sweeps (catGT equivalent)
rec2 = spre.phase_shift(rec1)

# Global median reference
rec3 = spre.common_reference(rec2, operator='median', reference='global')
rec = rec3

# Run KS4 within SpikeInterface
ss.get_default_sorter_params('kilosort4')

# for npx2.0 probes
sorting = ss.run_sorter_by_property('kilosort4', rec, grouping_property='group', folder=base_folder / spikeglx_folder / 'kilosort4', verbose=True)

The output is a nested set of subfolders D:\user\probe_data\subject\session_folder\kilosort4... \0\sorter_output \1\sorter_output \2\sorter_output \3\sorter_output

Where each of the sorter_output subfolders has its own set of .npy, .tsv and the phy params.py file

Many thanks for any insights.

jakeswann1 commented 2 months ago

For what it's worth, I've been sorting NP2.0 multi-shank recordings in KS4 without separating by shank, and haven't had any issues with units appearing across shanks or anything weird like that. I think if your probe map is correct and the shanks are sufficently far apart in space, as you would expect them to be in the brain, then it's very unlikely that you'll find units detected across two shanks - maybe worth a try?

zm711 commented 2 months ago

I think the cross shank issue was worse in KS2-3 (That's happened to me in the past). I personally haven't tested 4, but on their repo at least they explain they've worked hard to protect against multishank. In general per-shank will be better, but if KS4 defends enough then it might be easier to just do what Jake recommends. Don't let perfect be the enemy of good enough :)

YesterdaySam commented 2 months ago

That's fair, I can definitely get started with using KS4 on all the shanks together. Glad to hear it's generally acceptable to the community!

zm711 commented 2 months ago

So I would say give KS4 a try. If it does misbehave then the official recommendation would be to use the export_to_phy functionality (or try out spikeinterface-gui/sortingview) instead :)

Feel free to open a new issue if something else comes up!

SpikeInterface / spikeinterface

run_sorter_by_property aggregate export for Phy #3420