CUDA error while running run_sorter_by_property("kilosort4", ...

paolahydra commented 1 day ago

Hi everyone,

Occasionally, and in a not entirely reproducible way, I get this error while running kilosort (full error report in the file attached):

File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\site-packages\kilosort\clustering_qr.py", line 178, in kmeans_plusplus ix = dexp[:, imax] > 0 RuntimeError: CUDA error: unknown error CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

ks error.txt

With most datasets, I have no problems. With some datasets, calling the function again will work. With this one dataset, I am stuck at the second of 4 shanks and cannot move past that.

I have two questions:

Do you have any insight on what might be the underlying problem and how to fix it?
Is there a way to call run_sorter_by_property on a single specified group which is not necessarily the first (0)? Since I managed to sort shank 0, and I am getting the error in shank 1, I would like to skip 0 and possibly 1 too...

Thanks, Paola running: spikeinterface 0.101.0 kilosort 4.0.6

samuelgarcia commented 13 hours ago

Hi Paola. Are you using the run_sorter_by_property(engine='loop') with a diffrent engine than loop ? Maybe you are using engine='joblib' and then you have a parallel acces to the cuda driver.

paolahydra commented 12 hours ago

Hi Samuel, I am using engine='joblib' indeed. I will try using 'loop' instead.

There is no easy way to skip the group I have already processed, right?

Thanks! Paola

SpikeInterface / spikeinterface

CUDA error while running run_sorter_by_property("kilosort4", ... #3550