Hello there ! I am trying to use the cuPCL repository: https://github.com/NVIDIA-AI-IOT/cuPCL such to preprocess the PointCloud by a Voxel Downsampling Filter prior to using a the defined Clusterer. The program runs smoothly without the Voxel Downsampling , but the problem comes when only making an Instance of the filter as shown below:
So just using one works, but both produces a: Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700
After some trials with Debugging with CUDA-GDB and CUDA MEMCHECK I came to the following results but do not quite sure if they can be solved as the classes are implemented in a precompiled .so files:
Both classes invoke the cudaFillVoxelGirdKernel, and the error occurs on the Kernel Launch of the first function call that invokes the Kernel Launch :
Thread 1 "collision_avoid" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 0, grid 6, block (3,0,0), thread (160,0,0), device 0, sm 6, warp 4, lane 0]
0x0000555555d50eb0 in cudaFillVoxelGirdKernel(float4*, int4*, int4*, float4*, unsigned int, float, float, float) ()
The Thread is trying to write 4 bytes into some Global Memory address (CUDA MEMCHECK):
Invalid __global__ write of size 4
And from debugging:
Illegal access to address (@global)0x8007b0800c60 detected
(cuda-gdb) print *0x8007b0800c60
Error: Failed to read local memory at address 0x8007b0800c60 on device 0 sm 0 warp 9 lane 0, error=CUDBG_ERROR_INVALID_MEMORY_ACCESS(0x8).
Moreover the following CUDA API Error is Returned:
warning: Cuda API error detected: cuGetProcAddress returned (0x1f4)
This indicates that a named symbol was not found. Examples of symbols are global/constant variable names, driver function names, texture names, and surface names.
What I do not understand is that from the Thread's scope the address is treated as a local address , but actually it seems to be a global one. And whether if the CUDA API Error can be a lead of some sort.
Note that for memory transfer cudaMemMallocManaged has been used (UVM), and even using explicit memory transfers did not solve the issue.
Other efforts to solve the issue was to limit all CUDA computations to match the Device limits as follows:
Same here. I can't use both in the same program with Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700.
Problem Explaination
Hello there ! I am trying to use the cuPCL repository: https://github.com/NVIDIA-AI-IOT/cuPCL such to preprocess the PointCloud by a Voxel Downsampling Filter prior to using a the defined Clusterer. The program runs smoothly without the Voxel Downsampling , but the problem comes when only making an Instance of the filter as shown below:
So just using one works, but both produces a: Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700
After some trials with Debugging with CUDA-GDB and CUDA MEMCHECK I came to the following results but do not quite sure if they can be solved as the classes are implemented in a precompiled .so files:
This indicates that a named symbol was not found. Examples of symbols are global/constant variable names, driver function names, texture names, and surface names.
What I do not understand is that from the Thread's scope the address is treated as a local address , but actually it seems to be a global one. And whether if the CUDA API Error can be a lead of some sort.
Note that for memory transfer cudaMemMallocManaged has been used (UVM), and even using explicit memory transfers did not solve the issue.
Other efforts to solve the issue was to limit all CUDA computations to match the Device limits as follows:
But not changes have been yielded.
Device Info
Using Ros Noetic and Ubuntu 20.04