NVIDIA-AI-IOT / cuPCL

A project demonstrating how to use the libs of cuPCL.
MIT License
573 stars 93 forks source link

Voxeldownsampling and Cluster Filter - “Warp out of Range Address” #39

Open kameel2311 opened 1 year ago

kameel2311 commented 1 year ago

Problem Explaination

Hello there ! I am trying to use the cuPCL repository: https://github.com/NVIDIA-AI-IOT/cuPCL such to preprocess the PointCloud by a Voxel Downsampling Filter prior to using a the defined Clusterer. The program runs smoothly without the Voxel Downsampling , but the problem comes when only making an Instance of the filter as shown below:

cudaExtractCluster cudaec(stream);
cudaFilter filterTest(stream);

So just using one works, but both produces a: Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700

After some trials with Debugging with CUDA-GDB and CUDA MEMCHECK I came to the following results but do not quite sure if they can be solved as the classes are implemented in a precompiled .so files:

What I do not understand is that from the Thread's scope the address is treated as a local address , but actually it seems to be a global one. And whether if the CUDA API Error can be a lead of some sort.

Note that for memory transfer cudaMemMallocManaged has been used (UVM), and even using explicit memory transfers did not solve the issue.

Other efforts to solve the issue was to limit all CUDA computations to match the Device limits as follows:

  size_t limit = 0;
  cudaDeviceGetLimit(&limit, cudaLimitStackSize);
  std::cout << "Stack limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitStackSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitPrintfFifoSize);
  std::cout << "cudaLimitPrintfFifoSize limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitPrintfFifoSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitMallocHeapSize);
  std::cout << "cudaLimitMallocHeapSize limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitMallocHeapSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitDevRuntimeSyncDepth);
  std::cout << "cudaLimitDevRuntimeSyncDepth limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitDevRuntimeSyncDepth, limit);

  cudaDeviceGetLimit(&limit, cudaLimitDevRuntimePendingLaunchCount);
  std::cout << "cudaLimitDevRuntimePendingLaunchCount limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitDevRuntimePendingLaunchCount, limit);

  cudaDeviceGetLimit(&limit, cudaLimitMaxL2FetchGranularity);
  std::cout << "cudaLimitMaxL2FetchGranularity limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitMaxL2FetchGranularity, limit);

But not changes have been yielded.

Device Info

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   56C    P8    18W /  N/A |    123MiB /  7982MiB |     32%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
   Dev PCI Bus/Dev ID  Name Description                                   SM Type  
*   0  01:00.0        NVIDIA GeForce RTX 2080 Super with Max-Q Design     TU104-A   
SMs    Warps/SM Lanes/Warp Max Regs/Lane    Active SMs Mask 
sm_75  48       32         32           256 0x00000000000000000000ffffffffffff

Using Ros Noetic and Ubuntu 20.04

stelzo commented 1 year ago

Same here. I can't use both in the same program with Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700.