SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
493 stars 188 forks source link

Cannot set chunk_size during spykingcircus2 peak finding #3134

Open bharvey-neuro opened 2 months ago

bharvey-neuro commented 2 months ago

Hello,

I am new to using spikeinterface, and am trying to sort spikes using spykingcircus2 for multiple-stereotrode data. I'm currently on version 100.8.

I have imported, band-passed, and common-referenced my Intan RHS file, and correctly saved it out prior to running spykingcircus.

To correct for using stereotrodes instead of a typical array- or shank-based probe, I've run the following block (region/coords list shortened for brevity):

regions=np.array(('ParScrew','PP','PP','vDG','vDG','Ent','Ent'))

coords_dict={ 'ParScrew': (2800,1500,0), 'PP': (-2000,-2530,-1400),'vDG': (-2500,-3520,-2500)}

locs=[]

for region in regions:
    locs.append(coords_dict[region])

recording_preprocessed.set_property("brain region",regions) 
recording_preprocessed.set_channel_locations(locs)

This should, I believe, allow me to run the sorter using 'brain region' as a property to only run each stereotrode pair against one another.

The series of commands I'm currently using to then execute spykingcircus is:

job_kwargs = dict(n_jobs=15,total_memory='60G', progress_bar=True)
si.set_global_job_kwargs(**global_job_kwargs)
sorting = ss.run_sorter_by_property(sorter_name=sorter, recording=recording_preprocessed, remove_existing_folder=True, 
                                    grouping_property='brain region',working_folder='sort_by_group', job_kwargs=job_kwargs, 
                                    selection = {"n_peaks_per_channel": 10000,
                                    "min_n_peaks": 20000}, verbose=True, 
                                    detection={'detect_threshold':5}, apply_preprocessing=False)

print(sorting)
# engine="joblib",engine_kwargs={"n_jobs": 12},
sorting_SPC=sorting.save(folder=f'./{sorter}_sorting_output',overwrite=True)

So, all that said, my current issue is that when spykingcircus2 is extracting waveforms, the chunk_size of joblib is working correctly. However, when the "find spikes" operation is run, my chunk_size is limited to 3000.


extract waveforms shared_memory multi buffer with n_jobs = 15 and chunk_size = 500000000
extract waveforms shared_memory multi buffer: 100%
 1/1 [00:00<00:00,  5.78it/s]
find spikes (circus-omp-svd) with n_jobs = 15 and chunk_size = 3000

Can anyone advice on how to change this chunk size to speed up this process? I'm trying to analyze chronic recordings so ease and speed of processing is important to me!

Thanks!

alejoe91 commented 2 months ago

HI,

@yger can confirm, but I think that the template matching step has hardcoded chunk sizes

yger commented 2 months ago

Yes, indeed, in spyking circus 2, there used to be a hardcoded limit for the find_peaks procedure (if not using wobble). Are you using the latest version from main ? Because not sure this is still in place. However, the reasons was that smaller chunks are faster for the template-matching step algorithm, despite counterintuitive. But this should not harm the results