ptxas error : Entry function uses too much shared data

BruceCanovas / supersurfel_fusion

Dense RGB-D SLAM system RGB-D SLAM articulated around a supersurfel-based 3D representation for fast, lightweight and compact mapping in indoor environment.

61 stars 11 forks source link

ptxas error : Entry function uses too much shared data #1

Closed ZonePG closed 3 years ago

ZonePG commented 3 years ago

ptxas error : Entry function uses too much shared data

this error was presented when I run catkin_make -DOpenCV_DIR=<---------my opencv dir---->

with cuda11.0 and cudnn 8.0

ZonePG commented 3 years ago

here is the log:

ptxas error : Entry function '_ZN3cub30DeviceRadixSortDownsweepKernelINS_21DeviceRadixSortPolicyIiN6thrust5tupleI6float3S4_4int25Mat334Cov36float2fNS2_9null_typeES9_S9_EEiE9Policy700ELb0ELb0EiSA_iEEvPKT2_PSD_PKT3_PSH_PT4_SL_iiNS_13GridEvenShareISL_EE' uses too much shared data (0xc300 bytes, 0xc000 max) CMake Error at sfusion_generated_supersurfel_fusion.cu.o.Release.cmake:279 (message):

hanxiumeng commented 3 years ago

I have the same problem

BruceCanovas commented 3 years ago

What GPU are you using? And are you both on CUDA 11? It's weird because it should not happen with a recent GPU. I will try to see if I can install cuda 11 and reproduce the error. I will also edit the tracking part of the system because I think I use some old cuda instructions that may be outdated now.

hanxiumeng commented 3 years ago

@BruceCanovas hi,I used RTX2070 and I tried with cuda 11 and cuda10.2. Is this problem caused by GPU architecture？

BruceCanovas commented 3 years ago

I think so yes. Until now the system has been tested on an Nvidia GTX 950 M, a Jetson TX2 and Jetson Xavier as well as a Quadro P600. I am using a script cmake to check the compute capability of the GPU automatically and to set the right NVCC compilation flags. It may be obsolete for RTX GPUs but I don't see any reason why. Right now I don't have much idea about this error, I will investigate and let you know if I find something.

hanxiumeng commented 3 years ago

@BruceCanovas I am not very familiar with CUDA and I have just started to work with it. How do I set the NVCC compile flag and in which file do I set the compile flag

BruceCanovas commented 3 years ago

You can pass flags to the cuda compiler NVCC in the CMakeLists.txt file.

BruceCanovas commented 3 years ago

@hanxiumeng I have been able to build and run the code using an Nvidia GTX 1660 TI with cuda 10.2 as well. Maybe you can try to build the code specifying the correct architecture for your GPU in the CMakeLists.txt.

I am not sure but I think the problem is at the line 485 of supersurfel_fusion.cu, where I am making a tuple of all the thrust vectors of the model to sort them all in once. The tuple seems to be too big (however not for my 950 GTX GPU which is well below yours, so that's weird). One workaround may be to do a sort for each vector of the model separately, or to group them in smaller groups. It might slower a bit the code though.

hanxiumeng commented 3 years ago

@BruceCanovas Thanks for your work. I'll give it a try. And I'll give you feedback later

hanxiumeng commented 3 years ago

@BruceCanovas hi,I managed to compile by setting the NVCC parameter with cuda 10.2, but I ran into a new problem. I'll put it on another page

BloodLemonS commented 3 years ago

@BruceCanovas hi,I managed to compile by setting the NVCC parameter with cuda 10.2, but I ran into a new problem. I'll put it on another page

I came across the same problem and wondered if you had solved it，I'm using a 3060 graphics card with CUDa11.2 and OpencV4.4.0 installed