NVIDIA-AI-IOT / Lidar_AI_Solution

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Other
1.34k stars 236 forks source link

libspconv.so build #146

Open skprot opened 1 year ago

skprot commented 1 year ago

Thanks for your amazing job!

I'm wondering how to obtain libspconv.so for different platforms especially for sm_75 and etc. It seems that libspconv.so in your repo was pre-built with some differences such as spconv::load_engine_from_onnx. So I cannot just build libspconv.so from original spconv repo and replace it here. How I can build your modified libspconv.so?

hopef commented 1 year ago

Sorry, currently libspconv.so does not work in environments smaller than sm_80.

sandeepnmenon commented 1 year ago

@hopef When I export-scn from a "non ptq model" and try to load it using load_engine_from_onnx in https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/blob/87fb0cc6fcf38d0cf998bf0cdcbd039e6732d928/CUDA-BEVFusion/src/bevfusion/lidar-scn.cpp#L38C1-L39C1

I get the error

[libprotobuf FATAL /usr/include/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_): 

Sharing my Onnx model What could be the issue?

Versions

libprotoc 3.6.1
hopef commented 1 year ago

@hopef When I export-scn from a "non ptq model" and try to load it using load_engine_from_onnx in https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/blob/87fb0cc6fcf38d0cf998bf0cdcbd039e6732d928/CUDA-BEVFusion/src/bevfusion/lidar-scn.cpp#L38C1-L39C1

I get the error

[libprotobuf FATAL /usr/include/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_): 

Sharing my Onnx model What could be the issue?

Versions

libprotoc 3.6.1

Hi sandeepnmenon,

I've committed libspconv-1.1.0, which open-sources the libprotobuf part of the parsing code. For your error, you can use it for debugging.

hopef commented 1 year ago

Hi sandeepnmenon,

I can't see the bias of the SparseConvolution layer in your onnx. This may be the root cause.

sandeepnmenon commented 1 year ago

Hi sandeepnmenon,

I can't see the bias of the SparseConvolution layer in your onnx. This may be the root cause.

Thank you. I was using the lidar scn module from the checkpoint that did not go through the quantization code. When I passed my lidar model throught the quantization code, then the bias term comes and it is working. I think the quantisation code which is replacing the spconv modules with the custom classes is important for the libspconv module. Is that correct?

Also regarding this github issue. How to build to libspconv.so library? This repo only has the headers.

sandeepnmenon commented 1 year ago

Hi @hopef

When I export after loading the state dict of the model and run the exptool this is being caused. But if I run it through the quantisation module (ptq.py) where it replaces the spconv module with the custom modules, then it works.

Is the libspconv library tied to the SparseConvolution classes in the quantisation code?

hopef commented 1 year ago

First, the libspconv can support the SparseConvolution without bias. Second, the bias error is introduced by the onnx parser. You can handle it in code.

Is the libspconv library tied to the SparseConvolution classes in the quantisation code? -> So, there is no correlation between them.

san9569 commented 1 year ago

@hopef Hi, how to handle the bias error in code?

Thank you in advance

hopef commented 1 year ago

The latest version(v1.1.0) can handle the bias error.

san9569 commented 1 year ago

@hopef

Hi, I have replaced libspconv with libspconv-1.1.0 in this line and this line.

After that, when I run bash tool/run.sh, below error raises

/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp:67:15: error: ‘DTensor’ in namespace ‘spconv’ does not name a type; did you mean ‘ITensor’?
   67 |       spconv::DTensor *native_scn_output_ = nullptr; // TODO: DTensor
      |               ^~~~~~~
      |               ITensor
/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp: In member function ‘bool bevfusion::lidar::SCNImplement::init(const bevfusion::lidar::SCNParameter&)’:
/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp:43:31: error: ‘load_engine_from_onnx’ is not a member of ‘spconv’
   43 |         native_scn_ = spconv::load_engine_from_onnx(param_.model, static_cast<spconv::Precision>(param_.precision));
      |                               ^~~~~~~~~~~~~~~~~~~~~
/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp: In member function ‘virtual const nvtype::half* bevfusion::lidar::SCNImplement::forward(const nvtype::half*, unsigned int, void*)’:
/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp:51:9: error: ‘native_scn_output_’ was not declared in this scope
   51 |         native_scn_output_ = native_scn_->forward(
      |         ^~~~~~~~~~~~~~~~~~
/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp: In member function ‘virtual std::vector<long int> bevfusion::lidar::SCNImplement::shape()’:
/home/sensor_fusion/Lidar_AI_Solution/CUDA-BEVFusion_mmdet3d/src/bevfusion/lidar-scn.cpp:60:16: error: ‘native_scn_output_’ was not declared in this scope
   60 |         return native_scn_output_ == nullptr ? std::vector<int64_t>() : native_scn_output_->features_shape();
      |                ^~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/bevfusion_core.dir/build.make:3418: CMakeFiles/bevfusion_core.dir/src/bevfusion/lidar-scn.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:87: CMakeFiles/bevfusion_core.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

How can I apply libspconv with latest version?

In addition, when I use original libspconv.so in this repo, assertion error related to weight_scales occurs (this issue).

Can you assume the reason?

hopef commented 1 year ago

@sangjinpark97 Due to the interface update in version 1.1, a few code changes were required to adapt to the new version. You can take a look at the test code here.

guhuajun commented 1 year ago

Greetings,

I cannot find 1.1.1 in the branch page. Is it a special version?

Trying to make SM_75 work, but stopped at /usr/bin/ld: libbevfusion_core.so: undefined reference tospconv::load_engine_from_onnx`

[ 40%] Building NVCC (Device) object CMakeFiles/bevfusion_core.dir/src/bevfusion/bevfusion_core_generated_head-transbbox.cu.o
[ 45%] Building NVCC (Device) object CMakeFiles/bevfusion_core.dir/src/bevfusion/bevfusion_core_generated_transfusion.cu.o
[ 54%] Building CXX object CMakeFiles/bevfusion_core.dir/src/common/tensorrt.cpp.o
[ 59%] Building CXX object CMakeFiles/bevfusion_core.dir/src/bevfusion/lidar-scn.cpp.o
[ 59%] Building CXX object CMakeFiles/bevfusion_core.dir/src/bevfusion/bevfusion.cpp.o
[ 63%] Linking CXX shared library libbevfusion_core.so
[ 63%] Built target bevfusion_core
[ 68%] Building NVCC (Device) object CMakeFiles/bevfusion.dir/src/common/bevfusion_generated_visualize.cu.o
[ 72%] Building NVCC (Device) object CMakeFiles/bevfusion.dir/__/libraries/cuOSD/src/bevfusion_generated_cuosd_kernel.cu.o
[ 77%] Building CXX object CMakeFiles/bevfusion.dir/src/main.cpp.o
[ 86%] Building CXX object CMakeFiles/bevfusion.dir/workspace/libraries/cuOSD/src/textbackend/backend.cpp.o
[ 86%] Building CXX object CMakeFiles/bevfusion.dir/workspace/libraries/cuOSD/src/textbackend/pango-cairo.cpp.o
[ 90%] Building CXX object CMakeFiles/bevfusion.dir/workspace/libraries/cuOSD/src/textbackend/stb.cpp.o
[ 95%] Building CXX object CMakeFiles/bevfusion.dir/workspace/libraries/cuOSD/src/cuosd.cpp.o
[100%] Linking CXX executable bevfusion
/usr/bin/ld: libbevfusion_core.so: undefined reference to `spconv::load_engine_from_onnx(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, spconv::Precision)'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/bevfusion.dir/build.make:186: bevfusion] Error 1
make[1]: *** [CMakeFiles/Makefile2:111: CMakeFiles/bevfusion.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
root@4790ca7df0b6:/workspace/CUDA-BEVFusion# 
hopef commented 1 year ago

@guhuajun Sorry, I pushed here, not a branch.

wyfsean commented 11 months ago

@hopef Hi, I wanna compile the libspconv in win10, but I don't find the source code of libspconv, its not support win10,right?

slayerlpj commented 3 months ago

@hopef When I export-scn from a "non ptq model" and try to load it using load_engine_from_onnx in https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/blob/87fb0cc6fcf38d0cf998bf0cdcbd039e6732d928/CUDA-BEVFusion/src/bevfusion/lidar-scn.cpp#L38C1-L39C1 I get the error

[libprotobuf FATAL /usr/include/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_): 

Sharing my Onnx model What could be the issue? Versions

libprotoc 3.6.1

Hi sandeepnmenon,

I've committed libspconv-1.1.0, which open-sources the libprotobuf part of the parsing code. For your error, you can use it for debugging.

I had a same problem, and git clone the lastest repo, but I cannot find libspconv-1.1.1 in 3DSparseConvolution folder.

riteshkhrn commented 1 month ago

Hello Guys, I have created an opensource version of 3DSparseConvolution using SPCONV as base. Theoretically, it supports SM < 80, but have not tested it. https://github.com/riteshkhrn/Lidar_AI_Solution/tree/main/libraries/New3DSparseConvolution

check it out! :)