Open gabrielstine opened 3 months ago
Hey @gabrielstine,
So writing the recording does use parallelization if you have set n_jobs>1
in your job_kwargs. I tend to prefer using
si.set_global_job_kwargs(n_jobs=x) # + other kwargs
So if you aren't getting parallelization then we need to troubleshoot a bit! As far as your other question
I'm wondering if it's possible to do the preprocessing/whitening in SI, save the preprocessed data to binary using parallelization, and then point KS2 to this file for spikesorting
Yes it is. The params are here, but what you want is skip_kilosort_preprocessing=True
. We have options for referencing your data as well as whitening.
Want to give it a go and if you run into more problems let us know?
Zach
I am not certain on this but I think for file-writing, there is not much of a speed-up for parallelising across CPUs (discussed here and here) even with SSD. For reading there are some speedups with SSD but not always HDD (discussed here). I think in this case the only way to speed up is to avoid writing completely. I don't think it is a possibility to avoid writing the temp_wh.dat
file as KS2 writes this internally (I think after preprocessing before sorting).
It might be possible to avoid spikeinterface writing the recording to sorting output folder, I'm sure I have seen the recording.dat
writing skipped but I can't remember the circumstances, I think when the preprocessed data has already been saved to spikeinterface format (so actually, doesn't get around the problem). What is your preprocessing pipeline?
Hi Guys,
I am using KS2 in spikeinterface. Every time I run the sorter, SI will save the recording in the sorter output folder before running kilosort. This save seems slow—I assume it does not use parallelization. KS2 will then do whitening and save the temp_wh.dat file, which is also slow. I'm wondering if it's possible to do the preprocessing/whitening in SI, save the preprocessed data to binary using parallelization, and then point KS2 to this file for spikesorting. My sense is this would speed up my pipeline considerably and be more efficient storage wise.
Gabe