autowarefoundation / autoware_ai

Apache License 2.0
28 stars 11 forks source link

vision_darknet_detect node died: CUDA Error:invalid device symbol #797

Closed xiewenjing1170 closed 2 years ago

xiewenjing1170 commented 3 years ago

Bug report

Required information:

Description of the bug

When I run runtime_manager.launch and load Yolov3. The vision_darket_detect node dies all the time. The error as follows: `CUDA Error: invalid device symbol: File exists Failed to find match for field 'intensity'. [vision_darknet_detect-1] process has died [pid 6308, exit code 255, cmd /home/wenjing/autoware.ai/install/vision_darknet_detect/lib/vision_darknet_detect/vision_darknet_detect __name:=vision_darknet_detect __log:=/home/wenjing/.ros/log/c2a06674-4ade-11ec-b1b1-a4bb6d69b188/vision_darknet_detect-1.log]. log file: /home/wenjing/.ros/log/c2a06674-4ade-11ec-b1b1-a4bb6d69b188/vision_darknet_detect-1*.log` When nodes running, the memory is enough. I have no idea about this CUDA Error. I will really appreciate if anyone can help me. (I have tried several methods, but they doesn't work.) ### Steps to reproduce the bug

Expected behavior

Actual behavior

Screenshots

Additional information

bruce-almon commented 2 years ago

Hello, try docker build instead. The reason is that you are using CUDA 10.0 on a 3080 30 series GPU is NOT compatible with CUDA <11.0 You should install at least CUDA 11.0 devel lib for your 3080. I can run the docker version on my 3090.

wzr6009 commented 2 years ago

Hello, after I build with docker, cuda is 11.1 but CUDA Error: invalid device symbol will still appear.

super-zyw commented 1 year ago

Hi, Did the problem get solved? I am having the same error, but I don't know how to solve it.

wenjing1170 commented 1 year ago

@yinwei-zhang My error was caused by GPU RTX3080. 30 series GPU is NOT compatible with CUDA <11.0. In the end, I switched to a machine with GPU 2080 to run the program successfully!