flatironinstitute / ironclust

Spike sorting software being developed at Flatiron Institute, based on JRCLUST (Janelia Rocket Cluster)
Apache License 2.0
28 stars 7 forks source link

irc2- sorting gpu errors #47

Closed WeissShahaf closed 4 years ago

WeissShahaf commented 4 years ago

Hey James, when i try to run irc2, with fParfor=1 i get the following error: (matlab 2019a, Ubuntu OS, but it fails also in Win10)

Detecting 763/766: 436 spikes found (76.4 spikes/s, 21.9 MB/s, took 5.7 s) Detection took 634.2s and used 0.193 GiB (fParfor=1, fGpu=1) Saving a struct to irc2/raw_geom_irc.mat... took 244.9s. Clustering Calculating drift similarity... took 0.6s sortpage: calculating Rho... Page 1/3 took 34.0s Page 2/3 Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU.

In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An empty array (null pointer) was passed to the CUDA kernel. Attempting to access the elements of this array will cause an error on the GPU. In irc2>search_knndrift (line 3346) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) Warning: An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS In irc2>search_knndrift (line 3296) In irc2>rho_pagedsite (line 3210) In parallel_function>make_general_channel/channel_general (line 837) In remoteParallelFunction (line 46) CxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCxCIndex in position 1 is invalid. Array indices must be positive integers or logical values. Error in irc2>knncpu (line 3800) vrKnn(vi1_) = mrKnn1(end,:); Error in irc2>rho_pagedsite (line 3230) [vrRho_in(viiin1), miKnn] = knncpu(mrFet_out(:,vii_out1), mrFet_in(:,vii_in1), knn); Error in irc2>rhopage (line 3118) cvrRho_in1{iSite} = rho_pagedsite(S_page1, S_site1, iSite); Error in irc2>sortpage (line 3043) vrRho(viSpk_in1) = rhopage(Spage1); Error in irc2>sort (line 2829) [vrRho, vrDelta, viNneigh, memory_sort, nFeatures] = sortpage(S0, P, S_drift); Error in irc2 (line 213) S0.Sclu = sort(S0, P);

so i tried fParfor= 0; and got: irc2 sort irc2 (5.7.8) opening /media/weisss/4TB SSD/runfolder/20190828/irc2/raw_geom.prm irc2: cleared sort Running irc2.m (5.7.8) Loading irc2/raw_geom.prm... took 23.9s Clustering Calculating drift similarity... took 0.3s sortpage: calculating Rho... Page 1/3 C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C. took 192.9s Page 2/3 CIndex in position 1 is invalid. Array indices must be positive integers or logical values. Error in irc2>knncpu (line 3800) vrKnn(vi1_) = mrKnn1(end,:); Error in irc2>rho_pagedsite (line 3230) [vrRho_in(viiin1), miKnn] = knncpu(mrFet_out(:,vii_out1), mrFet_in(:,vii_in1), knn); Error in irc2>rhopage (line 3118) cvrRho_in1{iSite} = rho_pagedsite(S_page1, S_site1, iSite); Error in irc2>sortpage (line 3043) vrRho(viSpk_in1) = rhopage(Spage1); Error in irc2>sort (line 2829) [vrRho, vrDelta, viNneigh, memory_sort, nFeatures] = sortpage(S0, P, S_drift); Error in irc2 (line 213) S0.Sclu = sort(S0, P);

jamesjun commented 4 years ago

Thanks for sharing the file, I am running the file now and I will try to reproduce the error. By the way, did you run irc2 compile to compile CUDA code for your system? I recommend shutting down internet browser since it could use some GPU RAM. What's your GPU spec by the way?

WeissShahaf commented 4 years ago

-I use "irc update" followed by "irc2 compile". -I don't think it's a VRAM or hardware related issue. This was running on a 6 core, 256GB RAM, RTX8000 48GB VRAM. but sorting/curating takes so long that i'm also running on other PCs: -an 8 core,192GB RAM, P5000 16GB VRAM, -a 16 core, 128GB RAM, RTX 2070 with 8GB VRAM, -a 6 core, with 96GB RAM,P4000 with 8GB VRAM

on each machine i installed ironclust with git clone, initially, and then before a run i do irc2 update and compile in the matlab folder. so far i have not been able to run a complete irc2 run with manual curation on either one.

jamesjun commented 4 years ago

I tested on your dataset and the new version is working. I had to fix a bug on long recordings exceeding ~1.4 hours. The error was not on the CPU side and your GPU should work fine and CPU takes over when GPU fails.