SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
521 stars 186 forks source link

Question about Kilosort 2 #3209

Open gabrielstine opened 3 months ago

gabrielstine commented 3 months ago

Hi Guys,

I am using KS2 in spikeinterface. Every time I run the sorter, SI will save the recording in the sorter output folder before running kilosort. This save seems slow—I assume it does not use parallelization. KS2 will then do whitening and save the temp_wh.dat file, which is also slow. I'm wondering if it's possible to do the preprocessing/whitening in SI, save the preprocessed data to binary using parallelization, and then point KS2 to this file for spikesorting. My sense is this would speed up my pipeline considerably and be more efficient storage wise.

Gabe

zm711 commented 3 months ago

Hey @gabrielstine,

So writing the recording does use parallelization if you have set n_jobs>1 in your job_kwargs. I tend to prefer using

si.set_global_job_kwargs(n_jobs=x) # + other kwargs

So if you aren't getting parallelization then we need to troubleshoot a bit! As far as your other question

I'm wondering if it's possible to do the preprocessing/whitening in SI, save the preprocessed data to binary using parallelization, and then point KS2 to this file for spikesorting

Yes it is. The params are here, but what you want is skip_kilosort_preprocessing=True. We have options for referencing your data as well as whitening.

Want to give it a go and if you run into more problems let us know?

Zach

JoeZiminski commented 3 months ago

I am not certain on this but I think for file-writing, there is not much of a speed-up for parallelising across CPUs (discussed here and here) even with SSD. For reading there are some speedups with SSD but not always HDD (discussed here). I think in this case the only way to speed up is to avoid writing completely. I don't think it is a possibility to avoid writing the temp_wh.dat file as KS2 writes this internally (I think after preprocessing before sorting).

It might be possible to avoid spikeinterface writing the recording to sorting output folder, I'm sure I have seen the recording.dat writing skipped but I can't remember the circumstances, I think when the preprocessed data has already been saved to spikeinterface format (so actually, doesn't get around the problem). What is your preprocessing pipeline?