PRBonn / rangenet_lib

Inference module for RangeNet++ (milioto2019iros, chen2019iros)
MIT License
314 stars 72 forks source link

ERROR: could not create engine from ONNX #26

Closed robotichustle closed 3 years ago

robotichustle commented 4 years ago

Thanks for your work. I am really excited to demo it but I'm unable to run the demo.

I am using Ubuntu 18.04 + Cuda 10.1 + CuDNN 7.5.1 + TensorRT 5.1.5 (as per your README and compatibility of versions from Nvidia). GPU is GeForce GTX 1060.

I get the following runtime error as it can't generate the TRT file from the SemanticKitti darknet53 model:

(rangelib_venv) robot@robot-01:~/catkin_ws$ ./devel/lib/rangenet_lib/infer -p ~/datasets/semantic_kitti/models/darknet53 -s ./src/rangenet_lib/example/000000.bin --verbose
================================================================================
scan: ./src/rangenet_lib/example/000000.bin
path: /home/robot/datasets/semantic_kitti/models/darknet53/
verbose: 1
================================================================================
Setting verbosity to: false
Trying to open model
Trying to deserialize previously stored: /home/robot/datasets/semantic_kitti/models/darknet53//model.trt
Could not deserialize TensorRT engine.
Generating from sratch... This may take a while...
Trying to generate trt engine from : /home/robot/datasets/semantic_kitti/models/darknet53//model.onnx
Platform DOESN'T HAVE fp16 support.
No DLA selected.
Could not open file /home/robot/datasets/semantic_kitti/models/darknet53//model.onnx
Could not open file /home/robot/datasets/semantic_kitti/models/darknet53//model.onnx
Failed to parse ONNX model from file/home/robot/datasets/semantic_kitti/models/darknet53//model.onnx
Success picking up ONNX model
Failure creating engine from ONNX model
Current trial size is 8589934592
Failure creating engine from ONNX model
Current trial size is 4294967296
Failure creating engine from ONNX model
Current trial size is 2147483648
Failure creating engine from ONNX model
Current trial size is 1073741824
Failure creating engine from ONNX model
Current trial size is 536870912
Failure creating engine from ONNX model
Current trial size is 268435456
Failure creating engine from ONNX model
Current trial size is 134217728
Failure creating engine from ONNX model
Current trial size is 67108864
Failure creating engine from ONNX model
Current trial size is 33554432
Failure creating engine from ONNX model
Current trial size is 16777216
Failure creating engine from ONNX model
Current trial size is 8388608
Failure creating engine from ONNX model
Current trial size is 4194304
Failure creating engine from ONNX model
Current trial size is 2097152
Failure creating engine from ONNX model
Current trial size is 1048576
terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR: could not create engine from ONNX.
Aborted (core dumped)

I am not sure if it matters but the catkin build process passed but I got some warnings. Here is the output:

(rangelib_venv) robot@robot-01:~/catkin_ws$ catkin build rangenet_lib
----------------------------------------------------------
Profile:                     default
Extending:                   None
Workspace:                   /home/robot/catkin_ws
----------------------------------------------------------
Build Space:        [exists] /home/robot/catkin_ws/build
Devel Space:        [exists] /home/robot/catkin_ws/devel
Install Space:      [unused] /home/robot/catkin_ws/install
Log Space:          [exists] /home/robot/catkin_ws/logs
Source Space:       [exists] /home/robot/catkin_ws/src
DESTDIR:            [unused] None
----------------------------------------------------------
Devel Space Layout:          linked
Install Space Layout:        None
----------------------------------------------------------
Additional CMake Args:       None
Additional Make Args:        None
Additional catkin Make Args: None
Internal Make Job Server:    True
Cache Job Environments:      False
----------------------------------------------------------
Whitelisted Packages:        None
Blacklisted Packages:        None
----------------------------------------------------------

----------------------------------------------------------
WARNING: Your workspace is not extending any other result
space, but it is set to use a `linked` devel space
layout. This requires the `catkin` CMake package in your
source space in order to be built.
----------------------------------------------------------

[build] Found '2' packages in 0.0 seconds.
[build] Package table is up to date.
Starting  >>> catkin
Finished  <<< catkin                      [ 0.1 seconds ]
Starting  >>> rangenet_lib
Finished  <<< rangenet_lib                [ 0.1 seconds ]
[build] Summary: All 2 packages succeeded!
[build]   Ignored:   None.
[build]   Warnings:  None.                                                                                                                                                                                                                   [build]   Abandoned: None.
[build]   Failed:    None.
[build] Runtime: 0.2 seconds total.
Exception ignored in: <bound method BaseEventLoop.__del__ of <_UnixSelectorEventLoop running=False closed=True debug=False>>
Traceback (most recent call last):
  File "/home/robot/.virtualenvs/rangelib_venv/lib/python3.6/site-packages/trollius/base_events.py", line 395, in __del__
  File "/home/robot/.virtualenvs/rangelib_venv/lib/python3.6/site-packages/trollius/unix_events.py", line 65, in close
  File "/home/robot/.virtualenvs/rangelib_venv/lib/python3.6/site-packages/trollius/unix_events.py", line 166, in remove_signal_handler
  File "/usr/lib/python3.6/signal.py", line 47, in signal
TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object                                                                                                                                                       (rangelib_venv) robot@robot-01:~/catkin_ws$ rm -rf build/rangenet_lib/
(rangelib_venv) robot@robot-01:~/catkin_ws$ catkin build rangenet_lib > build.log 2>&1
(rangelib_venv) robot@robot-01:~/catkin_ws$ clear
(rangelib_venv) robot@robot-01:~/catkin_ws$ rm -rf build/rangenet_lib/
(rangelib_venv) robot@robot-01:~/catkin_ws$ catkin build rangenet_lib
----------------------------------------------------------
Profile:                     default
Extending:                   None
Workspace:                   /home/robot/catkin_ws
----------------------------------------------------------
Build Space:        [exists] /home/robot/catkin_ws/build
Devel Space:        [exists] /home/robot/catkin_ws/devel
Install Space:      [unused] /home/robot/catkin_ws/install
Log Space:          [exists] /home/robot/catkin_ws/logs
Source Space:       [exists] /home/robot/catkin_ws/src
DESTDIR:            [unused] None
----------------------------------------------------------
Devel Space Layout:          linked
Install Space Layout:        None
----------------------------------------------------------
Additional CMake Args:       None
Additional Make Args:        None
Additional catkin Make Args: None
Internal Make Job Server:    True
Cache Job Environments:      False
----------------------------------------------------------
Whitelisted Packages:        None
Blacklisted Packages:        None
----------------------------------------------------------

----------------------------------------------------------
WARNING: Your workspace is not extending any other result
space, but it is set to use a `linked` devel space
layout. This requires the `catkin` CMake package in your
source space in order to be built.
----------------------------------------------------------

[build] Found '2' packages in 0.0 seconds.
[build] Package table is up to date.
Starting  >>> catkin
Finished  <<< catkin                      [ 0.1 seconds ]
Starting  >>> rangenet_lib
____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
Warnings   << rangenet_lib:cmake /home/robot/catkin_ws/logs/rangenet_lib/build.cmake.017.log
Build type: Release
YAML Libs: yaml-cpp
YAML Headers: /usr/lib/x86_64-linux-gnu/cmake/yaml-cpp/../../../../include
Boost Libs: /usr/lib/x86_64-linux-gnu/libboost_program_options.so;/usr/lib/x86_64-linux-gnu/libboost_filesystem.so;/usr/lib/x86_64-linux-gnu/libboost_system.so
Boost Headers: /usr/include

TensorRT available!
CUDA Libs: /usr/local/cuda-10.1/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so
CUDA Headers: /usr/local/cuda-10.1/include
NVINFER: /home/robot/local/TensorRT-5.1.5.0/lib/libnvinfer.so
NVINFERPLUGIN: /home/robot/local/TensorRT-5.1.5.0/lib/libnvinfer_plugin.so
NVPARSERS: /home/robot/local/TensorRT-5.1.5.0/lib/libnvparsers.so
NVONNXPARSER: /home/robot/local/TensorRT-5.1.5.0/lib/libnvonnxparser.so
NVONNXPARSERRUNTIME: /home/robot/local/TensorRT-5.1.5.0/lib/libnvonnxparser_runtime.so
All togheter now (libs): /usr/local/cuda-10.1/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so;/home/robot/local/TensorRT-5.1.5.0/lib/libnvinfer.so;/home/robot/local/TensorRT-5.1.5.0/lib/libnvinfer_plugin.so;/home/robot/local/TensorRT-5.1.5.0/lib/libnvparsers.so;/home/robot/local/TensorRT-5.1.5.0/lib/libnvonnxparser.so;/home/robot/local/TensorRT-5.1.5.0/lib/libnvonnxparser_runtime.so
CUDA include dirs (inc): /usr/local/cuda-10.1/include
TensorRT SUCCESS!

Building TensorRT
Building example...
OpenCV Libs: opencv_core;opencv_viz
OpenCV Headers: /usr/include;/usr/include/opencv

cd /home/robot/catkin_ws/build/rangenet_lib; catkin build --get-env rangenet_lib | catkin env -si  /usr/local/bin/cmake /home/robot/catkin_ws/src/rangenet_lib --no-warn-unused-cli -DCATKIN_DEVEL_PREFIX=/home/robot/catkin_ws/devel/.private/rangenet_lib -DCMAKE_INSTALL_PREFIX=/home/robot/catkin_ws/install; cd -
............................................................................................................................................................................................................................................
Finished  <<< rangenet_lib                [ 8.6 seconds ]
[build] Summary: All 2 packages succeeded!
[build]   Ignored:   None.
[build]   Warnings:  1 packages succeeded with warnings.
[build]   Abandoned: None.
[build]   Failed:    None.
[build] Runtime: 8.7 seconds total.
Exception ignored in: <bound method BaseEventLoop.__del__ of <_UnixSelectorEventLoop running=False closed=True debug=False>>
Traceback (most recent call last):
  File "/home/robot/.virtualenvs/rangelib_venv/lib/python3.6/site-packages/trollius/base_events.py", line 395, in __del__
  File "/home/robot/.virtualenvs/rangelib_venv/lib/python3.6/site-packages/trollius/unix_events.py", line 65, in close
  File "/home/robot/.virtualenvs/rangelib_venv/lib/python3.6/site-packages/trollius/unix_events.py", line 166, in remove_signal_handler
  File "/usr/lib/python3.6/signal.py", line 47, in signal
TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object

Also, not sure if this is relevant but running the sample_onnx_mnist seems to work fine.

cd ~/local/TensorRT-5.1.5.0/samples
make -j8
cd ../bin
./sample_onnx_mnist

Success:

&&&& PASSED TensorRT.sample_onnx_mnist # ./sample_onnx_mnist

I'd appreciate your help. Thank you in advance.

Chen-Xieyuanli commented 4 years ago

Hey @robotichustle,

Thank you for using our code.

The catkin build TypeError is a Python-related issue, see more here. I'm not sure whether this is related to the TRT model problem or not.

For the TRT model problem, it should be a TensorRT-related issue. We never encountered this error by ourselves and it's hard for us to reproduce it and debug it.

However, there are some similar issues reported by others. You may have a look there #15 #22.

Chen-Xieyuanli commented 4 years ago

Hey @robotichustle, is there any update on this issue?

ZhenminHuang commented 4 years ago

@robotichustle Previously I was using CUDA10.1+cudnn7.6.5+trt5.1.5 with NVIDIA P1000, and I ran into exactly the same problem. I tried downgrading CUDA10.1 to CUDA10.0 and it works well now. I also tested this workaround with another device with GTX 1060, and it works well too. I suggest maybe trt5.1.5 is somehow incompatible with CUDA version higher than 10.0?

Chen-Xieyuanli commented 4 years ago

@ZhenminHuang Thank you so much for the feedback!

Since there is no further update about this issue, I'm going to close it. Please feel free to ask me to reopen it if needed.

kuzen commented 4 years ago

check you cublas version == cuda version

RichExplor commented 3 years ago

I also tested with GTX1660 + cuda10.0 + cudnn7.5.0 + TensorRT5.1.5.0, and it works well.

xdtzzz commented 2 years ago

hi, @GuoFeng-X , 你的显卡内存是多少呢

zhenzhongcao commented 2 years ago

I have the same problem, is anybody can propose a solution?

zhenzhongcao commented 2 years ago

I have the same problem, is anybody can propose a solution?