cortex-lab / KiloSort

GPU code for spike sorting
GNU General Public License v2.0
176 stars 100 forks source link

Kilosort GPU crashing on linux before final run (linux 18.04, Nvidia 410.78) #176

Closed catubc closed 5 years ago

catubc commented 5 years ago

Hi

I'm having a bit of trouble running kilosort to completion for ~25GB, 512channel dataset. I am on linux (ubuntu 16.04, with nvidia 410.78 drivers, Matlab R2018b, Titan XP 12GB). Kilosort used to run ok on R2017B and ubuntu 16.04. It also runs fine on smaller datasets

As far as I can tell, GPU memory is not being freed up before the last batch run (i.e. there appears to be ~4GB of GPU ram still loaded after the crash, not sure if that's a clue). I tried lowering the GPU memory parameter (ops.ForceMaxRAMforDat = 10e7) but seems I get the same errors. The CPU-only version seems to work fine.

The crash log is below, any advice is appreciated.

Thanks! Cat


>> master_eMouse
Time   0s. Loading raw data... 
Time  85s. Channel-whitening filters computed. 
Time  85s. Loading raw data and applying filters... 
Time 470.02. Whitened data written to disk... 
Time 470.02. Preprocessing complete!
Time 1465s. Optimizing templates ...
Time 2155.35, batch 1101/1104, mu 14.72, neg-err 13129084.050951, NTOT 3661849, n100 7941, n200 4642, n300 3380, n400 2612
Time 2209s. Running the final template matching pass...
Time 2211.22, batch 1/184,  NTOT 42398
Error using gpuArray.zeros
Out of memory on device. To view more detail about available memory on the GPU, use
'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in fullMPMU (line 139)
        data    = gpuArray.zeros(NT, Nfilt, Nrank, 'single');

Error in master_eMouse (line 26)
rez                = fullMPMU(rez, DATA);% extract final spike times (overlapping
extraction)
marius10p commented 5 years ago

Hi, how many filters do you run it with? Have you tried clearing the GPU just before the final step? You can do so with gpuDevice(1). You can also lower the batch size to half its default.

The RAM parameter is for system RAM, not GPU RAM.

catubc commented 5 years ago

I changed batch size and added gpuDevice(1) before final pass and the run completed fine. Thanks!

m-beau commented 4 years ago

Hi Marius,

I am encountering a similar issue. I am trying to kilosort2 a 384 channels * 2h30 long recording (~200GB of data) and my GPU struggles to handle it.

The GPU memory error occurs during the splitting step:

...
Found 919 splits, checked 501/1479 clusters, nccg 272 
Error using gpuArray/filter
Out of memory on device. To view more detail about available memory on the GPU, use
'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in my_conv2 (line 47)
        S1 = filter(gaus, 1, cat(1, S1, zeros([tmax, dsnew2(2)])));

Error in splitAllClusters (line 56)
    clp = clp - my_conv2(clp, 250, 1);

Error in master_kilosort (line 43)
rez = splitAllClusters(rez, 1);

Error in metamaster_MB (line 39)
master_kilosort(datasets{1}{1}, datasets{1}{2}, datasets{1}{3});

I initially used the tip above to reset the GPU before every stage of the master script (i.e. before clusterSingleBatches, before learnAndSolve8b, find_merges, splitAllClusters(1) and splitAllClusters(0)).

This logically led to this other error, this time at the merging step:

...
merged 136 into 137 
Error using gpuArray/subsref
The data no longer exists on the device.

Error in splitAllClusters (line 20)
[~, iW] = max(abs(rez.dWU(ops.nt0min, :, :)), [], 2);

Error in master_kilosort (line 51)
rez = splitAllClusters(rez, 1);

Error in metamaster_MB (line 45)
    master_kilosort(datasets{i}{1}, datasets{i}{2}, datasets{i}{3});

So my question is: at which stages of the master script can we or not reset the GPU using gpuDevice(1), in order to clear memory without throwing away necessary data for the subsequent step?

Thanks a lot for all the hard work!

PS: I also reduced the batch size to the minimum possible i.e. half the default setting, 32*1024+ ops.ntbuff (if I go lower I end up not having enough spikes per batch, which is incompatible with the drift correction, see this issue and the error below:

Error using gpuArray/eig
Input matrix contains NaN or Inf.

Error in svdecon (line 23)
[U,D] = eig(C);

Error in sortBatches2 (line 7)
[u, s, v] = svdecon(ccb0);

Error in clusterSingleBatches (line 150)
[ccb1, iorig] = sortBatches2(ccb0);

Error in MasterKiloSortLJB (line 59)
rez = clusterSingleBatches(rez);

)

PPS: I also needed to install extra RAM on my machine to be able to reach this point, Kilosort cannot process too long recordings on 32GB of RAM. I am not sure where they sit, but this could be stated somewhere in the hardware recommendations.

m-beau commented 4 years ago

FYI here is the output of the command 'gpuDevice()' right after the crash during the splitting step (seems like there is a bunch of 'AvailableMemory' left but I am not sure to understand that properly):

gpuDevice()

ans = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 1080 Ti'
                     Index: 1
         ComputeCapability: '6.1'
            SupportsDouble: 1
             DriverVersion: 10.1000
            ToolkitVersion: 10.1000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.1811e+10
           AvailableMemory: 9.2967e+09
       MultiprocessorCount: 28
              ClockRateKHz: 1582000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1
xiaoxiaotao2 commented 2 years ago

I have a question about the final matching pass. Why I only batch 1/87, NTOT 114 and the kilosort did not detect the spikes for me when I use my own data. Capture