MouseLand / Kilosort

Fast spike sorting with drift correction for up to a thousand channels
https://kilosort.readthedocs.io/en/latest/
GNU General Public License v3.0
469 stars 245 forks source link

CUDA timeout in preprocessing, KS3 #323

Closed frikyng closed 8 months ago

frikyng commented 3 years ago

Hi,

when I am preprocessing Neuropixel phase 3a or NP 1.0 data in KiloSort 3 I get an error from CUDA in the KS GUI. This error happens after the screen response has lagged for a few minutes and I haven't been able to do any work in parallel.

CUDA_ERROR_LAUNCH_TIMEOUT

I have looked up the error online and it seems to be due to the fact that my graphics card (Nvidia Quadro M4000) has to serve my screen and KS at the same time. When the KS instruction to the GPU takes longer than 2 seconds to complete a protocol is triggered that resets the graphics driver (and cancelling KS). It would be possible to remove 2 second threshold in the Windows regEdit but this would only alleviate the symptom and not solve the problem (while additionally making the screen response slow).

What I have tried so far:

I have seen that when KS 3 is preprocessing data the graphics card is occupied only period-wise. Though KS 3 has a new spike detection algorithm it looks like it is processing chunks like KS 2. A colleague of mine can run KS 3 fine with a Quadro P4000 without any frame rate drops, which only slightly better than my M4000.

marius10p commented 3 years ago

Can you please provide the command line output? Something seems to be going wrong.

In the past we've had to disable that watchdog, but I think current Nvidia drivers automatically disable it or circumvent it somehow. Maybe it's different for Quadro cards. There shouldn't be a disadvantage to disabling the watchdog, and your screen response should not become slow.

frikyng commented 3 years ago

First it give me this repeating warning that recurs for nearly how much space there is in the command window

CUDA_ERROR_LAUNCH_TIMEOUT 
> In standalone_detector (line 11)
  In datashift2 (line 40)
  In ksGUI/runPreproc (line 726)
  In ksGUI/runAll (line 627)
  In ksGUI>@(~,~)obj.runAll() (line 319) 

then it throws this error

Error using gpuArray
An unexpected error occurred during CUDA execution. The CUDA error was:
the launch timed out and was terminated

Error in ksFilter (line 15)
    dataRAW = gpuArray(buff);

Error in ksGUI/updateDataView (line 867)
                    datAllF = ksFilter(datAll, obj.ops);

Error in ksGUI/dataClickCB (line 1403)
            obj.updateDataView;

Error in ksGUI>@(f,k)obj.dataClickCB(f,k) (line 385)
            set(obj.H.dataAx, 'ButtonDownFcn', @(f,k)obj.dataClickCB(f, k));

Error using ksGUI/log (line 1588)

Error while evaluating Axes ButtonDownFcn.

My CUDA version is 10.2. My colleague who can run KS 3 without any issues has a Quadro P4000. Maybe mine is just a bit underpowered to handle the task?

marius10p commented 3 years ago

8GB of gpu RAM is more than enough.

I forgot to mention, but you also need to install the specific version of CUDA that your Matlab version requires: https://www.mathworks.com/help/parallel-computing/gpu-support-by-release.html;jsessionid=f68ff768914bd294d61356fc7d1d

frikyng commented 3 years ago

Yeah, like I mentioned before, colleagues of mine get by easily with a P4000, which has nearly he same specs. I noticed that there were 5 (!) versions of CUDA installed on the PC so removed all of them and left CUDA 10.0. I am working with Matlab version 2019b so it should be the right one. Unfortunately, this hasn't alleviated the problem and KS crashed with the same error when I tried it again..

I am attaching a screenshot from the task manager where KS 3 is preprocessing NP data. You can see how it ramps up to 100% on every chunk that is processed.

KS3_newCuda

marius10p commented 3 years ago

The memory usage is stable, that's just the usage ramping up. Your GPU really is up to the task, but there must be something wrong with it's configuration. Have you updated the Nvidia drivers? This is separate from CUDA. In cases like this I would just start over from scratch with uninstalling and re installing visual studio, CUDA and Matlab, in that order.

frikyng commented 3 years ago

I reinstalled CUDA and VisualStudio but still have the same issue. Though the throttling pattern of the CPU changed, which shows something changed under the hood (screenshot attached). Here is my current configuration:

Does that look alright? KS3_newCuda(10 1)+newVisuaStudio