NVIDIA-AI-IOT / cuPCL

A project demonstrating how to use the libs of cuPCL.
MIT License
573 stars 93 forks source link

Cuda filter demo, cuda-pcl is worse than pcl when I use the VoxelGrid #41

Open NJUSTzwh opened 1 year ago

NJUSTzwh commented 1 year ago

cuda-pcl in PassThrough is better than pcl but in VoxelGrid is not well

MagicalBrain commented 1 year ago

Your output info make me confused, your NX even slower than my jetson nano(4GB), and it should not be. The output info of my jetson nano as follows:

./demo 

GPU has cuda devices: 1
----device id: 0 info----
  GPU : NVIDIA Tegra X1 
  Capbility: 5.3
  Global memory: 3956MB
  Const memory: 64KB
  SM in a block: 48KB
  warp size: 32
  threads in a block: 1024
  block dim: (1024,1024,64)
  grid dim: (2147483647,65535,65535)

------------checking CUDA ---------------- 
CUDA Loaded 119978 data points from PCD file with the following fields: x y z

------------checking CUDA PassThrough ---------------- 
CUDA PassThrough by Time: 1.9844 ms.
CUDA PassThrough before filtering: 119978
CUDA PassThrough after filtering: 5110

------------checking CUDA VoxelGrid---------------- 
CUDA VoxelGrid by Time: 35.325 ms.
CUDA VoxelGrid before filtering: 119978
CUDA VoxelGrid after filtering: 3440

------------checking PCL ---------------- 
PCL(CPU) Loaded 119978 data points from PCD file with the following fields: x y z

------------checking PCL(CPU) PassThrough ---------------- 
PCL(CPU) PassThrough by Time: 9.47348 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 5110 data points (x y z).

------------checking PCL VoxelGrid---------------- 
PCL VoxelGrid by Time: 24.2884 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 3440 data points (x y z).

And when I run the jetson clocks, it will be faster, the output info as follows:

./demo 

GPU has cuda devices: 1
----device id: 0 info----
  GPU : NVIDIA Tegra X1 
  Capbility: 5.3
  Global memory: 3956MB
  Const memory: 64KB
  SM in a block: 48KB
  warp size: 32
  threads in a block: 1024
  block dim: (1024,1024,64)
  grid dim: (2147483647,65535,65535)

------------checking CUDA ---------------- 
CUDA Loaded 119978 data points from PCD file with the following fields: x y z

------------checking CUDA PassThrough ---------------- 
CUDA PassThrough by Time: 1.39955 ms.
CUDA PassThrough before filtering: 119978
CUDA PassThrough after filtering: 5110

------------checking CUDA VoxelGrid---------------- 
CUDA VoxelGrid by Time: 11.9661 ms.
CUDA VoxelGrid before filtering: 119978
CUDA VoxelGrid after filtering: 3440

------------checking PCL ---------------- 
PCL(CPU) Loaded 119978 data points from PCD file with the following fields: x y z

------------checking PCL(CPU) PassThrough ---------------- 
PCL(CPU) PassThrough by Time: 3.32619 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 5110 data points (x y z).

------------checking PCL VoxelGrid---------------- 
PCL VoxelGrid by Time: 16.5497 ms.
PointCloud before filtering: 119978 data points (x y z).
PointCloud after filtering: 3440 data points (x y z).

Finally, I don't know why cuda-pcl in PassThrough is better than pcl but in VoxelGrid is not well, but I think maybe that's why pcl remove the cuda support of voxelgrid in pcl-1.13.1.

qilinhu commented 7 months ago

@MagicalBrain hello,I want to ask for advice. Running machine environment:

image

When I use the official cuFilter demo, the cuda calculation time is basically the same as the official one. As follows: ------------checking CUDA VoxelGrid---------------- CUDA VoxelGrid by Time: 3.20768 ms. CUDA VoxelGrid before filtering: 119978 CUDA VoxelGrid after filtering: 3440

But when I try to set setP.voxelX, setP.voxelY, and setP.voxelZ to 0.09, the cuda calculation time is much slower, which is not as expected. As follows: ------------checking CUDA VoxelGrid---------------- CUDA VoxelGrid by Time: 3109.65 ms. CUDA VoxelGrid before filtering: 119978 CUDA VoxelGrid after filtering: 62844

Why is this? Is there any way to solve this situation? In most cases, setP.voxelX, setP.voxelY, and setP.voxelZ cannot always be set to 1. I hope someone can help.