Problem about cuda when cyber_launch start the perception_camera.launch in Apollo5.0 #10369

Closed LidarADAS closed 2 years ago

LidarADAS commented 4 years ago

System information

ubuntu 18.04



Steps to reproduce the issue:

[NVBLAS] No Gpu available [NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf' [NVBLAS] Cannot open default config file 'nvblas.conf' [NVBLAS] Config parsed [NVBLAS] CPU Blas library need to be provided F1202 16:42:09.057770 11360 cuda_util.cu:50] Check failed: cuda_error == cudaSuccess (30 vs. 0) unknown error

Then I tried many modules which need GPU, the same error was got. But modules do not need GPU could be launch successfully.

Supporting materials (screenshots, command lines, code/script snippets):

Eclipsehelio commented 4 years ago

Maybe it was caused by user privileges, try to execute deviceQuery under /usr/local/cuda/samples/1_Utilities/deviceQuery and check the result, or else execute the following commands and try again.

$ sudo usermod -aG video $USER
$ newgrp
LidarADAS commented 4 years ago

I try to find /usr/local/cuda/samples/1_Utilities/deviceQuery in docker, but there is no smples folder under cuda.

Then I executed the commands you provided, and no change, the error still exists.

Is there any other suggestion ?

Eclipsehelio commented 4 years ago

Could you please post the folder(/usr/local/cuda/)'s content? Maybe the 'samples' folder has been deleted, but it's not a problem, I will post it in the following reply!

$ ls /usr/local/cuda/samples/1_Utilities/deviceQuery
Eclipsehelio commented 4 years ago

Put deviceQuery.cpp and Makefile in the same folder, and then execute 'make' to build deviceQuery.

LidarADAS commented 4 years ago

I clone the cuda from github(https://github.com/NVIDIA/cuda-samples), then compiled the deviceQuery. When I run the binary, I get the result:


./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version / Runtime Version 10.1 / 10.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11176 MBytes (11719409664 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1645 MHz (1.64 GHz) Memory Clock rate: 5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.0, NumDevs = 1 Result = PASS

LidarADAS commented 4 years ago

I debug apollo's lane perception code, and found the error was occured when cudaGetDevice . But when I do the same option in deviceQuery, the result is right.

Avps1 commented 4 years ago

I debug apollo's lane perception code, and found the error was occured when cudaGetDevice . But when I do the same option in deviceQuery, the result is right.

Can you please share the steps you followed to debug the Lane Perception code?Did you use VSCode ?

HUI11126 commented 3 years ago

I debug apollo's lane perception code, and found the error was occured when cudaGetDevice . But when I do the same option in deviceQuery, the result is right.

Hello. Did you solve the problem? My error is the same with yours.

HUI11126 commented 3 years ago

I solved the problem following https://blog.csdn.net/qq_41481731/article/details/86658523


cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery sudo make sudo ./deviceQuery

hzzzzjzyq commented 3 years ago

I clone the cuda from github(https://github.com/NVIDIA/cuda-samples), then compiled the deviceQuery. When I run the binary, I get the result:


./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version / Runtime Version 10.1 / 10.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11176 MBytes (11719409664 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1645 MHz (1.64 GHz) Memory Clock rate: 5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.0, NumDevs = 1 Result = PASS

I have same question that cyber_launch start GPUassert and i /usr/local/cuda can't find samples. The apollo guide doesn't mention install cuda.I follow apollo Installing NVIDIA Container Toolkit and finally install nvidia-docker2. why i lack CUDA? Do you mean you lack CUDA in host machine or you lack CUDA in docker?

daohu527 commented 2 years ago

