Closed jgocm closed 2 years ago
@jgocm
Thank you for reporting the problem.
I confirmed that a build error will occur. Modify the code. Please wait.
Fixed in the commit of 16b4342895038b6bf2a6c1aa6adcf37722614136. Thank you for reporting the problem.
Thank you for the support and the fast responses!
I was able to build TensorRT after the changes, but I think it is still building the wrong TRT version (8.2.0).
After building the "make", these are the generated files at my "TensorRT/build/out" directory:
libnvcaffeparser.so
libnvcaffeparser.so.8
libnvcaffeparser.so.8.2.0
libnvcaffeparser_static.a
libnvinfer_plugin.so
libnvinfer_plugin.so.8
libnvinfer_plugin.so.8.2.0
libnvinfer_plugin_static.a
libnvonnxparser.so
libnvonnxparser.so.8
libnvonnxparser.so.8.2.0
output.txt
sample_algorithm_selector
sample_char_rnn
sample_dynamic_reshape
sample_fasterRCNN
sample_googlenet
sample_int8
sample_int8_api
sample_mnist
sample_mnist_api
sample_nmt
sample_onnx_mnist
sample_onnx_mnist_coord_conv_ac
sample_reformat_free_io
sample_ssd
sample_uff_fasterRCNN
sample_uff_maskRCNN
sample_uff_mnist
sample_uff_plugin_v2_ext
sample_uff_ssd
trtexec
Still, I tried changing only the file names to be same as suggested on the repo:
sudo cp out/libnvinfer_plugin.so.7.2.3 /usr/lib/aarch64-linux-gnu/ sudo rm /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7 sudo ln -s /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.2.3 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
So now I have:
libnvcaffeparser.so
libnvcaffeparser.so.7
libnvcaffeparser.so.7.2.3
libnvcaffeparser_static.a
libnvinfer_plugin.so
libnvinfer_plugin.so.7
libnvinfer_plugin.so.7.2.3
libnvinfer_plugin_static.a
libnvonnxparser.so
libnvonnxparser.so.7
libnvonnxparser.so.7.2.3
output.txt
sample_algorithm_selector
sample_char_rnn
sample_dynamic_reshape
sample_fasterRCNN
sample_googlenet
sample_int8
sample_int8_api
sample_mnist
sample_mnist_api
sample_nmt
sample_onnx_mnist
sample_onnx_mnist_coord_conv_ac
sample_reformat_free_io
sample_ssd
sample_uff_fasterRCNN
sample_uff_maskRCNN
sample_uff_mnist
sample_uff_plugin_v2_ext
sample_uff_ssd
trtexec
Then, I copied my 'ssdlite_mobilenet_v2_300x300_gs.onnx' model to 'tensorrt-examples/models' directory and tried to check the model:
/usr/src/tensorrt/bin/trtexec --onnx=/home/joao/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_gs.onnx
The output was:
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/joao/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_gs.onnx
[03/08/2022-12:12:52] [I] === Model Options ===
[03/08/2022-12:12:52] [I] Format: ONNX
[03/08/2022-12:12:52] [I] Model: /home/joao/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_gs.onnx
[03/08/2022-12:12:52] [I] Output:
[03/08/2022-12:12:52] [I] === Build Options ===
[03/08/2022-12:12:52] [I] Max batch: 1
[03/08/2022-12:12:52] [I] Workspace: 16 MB
[03/08/2022-12:12:52] [I] minTiming: 1
[03/08/2022-12:12:52] [I] avgTiming: 8
[03/08/2022-12:12:52] [I] Precision: FP32
[03/08/2022-12:12:52] [I] Calibration:
[03/08/2022-12:12:52] [I] Safe mode: Disabled
[03/08/2022-12:12:52] [I] Save engine:
[03/08/2022-12:12:52] [I] Load engine:
[03/08/2022-12:12:52] [I] Builder Cache: Enabled
[03/08/2022-12:12:52] [I] NVTX verbosity: 0
[03/08/2022-12:12:52] [I] Inputs format: fp32:CHW
[03/08/2022-12:12:52] [I] Outputs format: fp32:CHW
[03/08/2022-12:12:52] [I] Input build shapes: model
[03/08/2022-12:12:52] [I] Input calibration shapes: model
[03/08/2022-12:12:52] [I] === System Options ===
[03/08/2022-12:12:52] [I] Device: 0
[03/08/2022-12:12:52] [I] DLACore:
[03/08/2022-12:12:52] [I] Plugins:
[03/08/2022-12:12:52] [I] === Inference Options ===
[03/08/2022-12:12:52] [I] Batch: 1
[03/08/2022-12:12:52] [I] Input inference shapes: model
[03/08/2022-12:12:52] [I] Iterations: 10
[03/08/2022-12:12:52] [I] Duration: 3s (+ 200ms warm up)
[03/08/2022-12:12:52] [I] Sleep time: 0ms
[03/08/2022-12:12:52] [I] Streams: 1
[03/08/2022-12:12:52] [I] ExposeDMA: Disabled
[03/08/2022-12:12:52] [I] Spin-wait: Disabled
[03/08/2022-12:12:52] [I] Multithreading: Disabled
[03/08/2022-12:12:52] [I] CUDA Graph: Disabled
[03/08/2022-12:12:52] [I] Skip inference: Disabled
[03/08/2022-12:12:52] [I] Inputs:
[03/08/2022-12:12:52] [I] === Reporting Options ===
[03/08/2022-12:12:52] [I] Verbose: Disabled
[03/08/2022-12:12:52] [I] Averages: 10 inferences
[03/08/2022-12:12:52] [I] Percentile: 99
[03/08/2022-12:12:52] [I] Dump output: Disabled
[03/08/2022-12:12:52] [I] Profile: Disabled
[03/08/2022-12:12:52] [I] Export timing to JSON file:
[03/08/2022-12:12:52] [I] Export output to JSON file:
[03/08/2022-12:12:52] [I] Export profile to JSON file:
[03/08/2022-12:12:52] [I]
----------------------------------------------------------------
Input filename: /home/joao/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_gs.onnx
ONNX IR version: 0.0.8
Opset version: 11
Producer name:
Producer version:
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[03/08/2022-12:12:54] [03/08/2022-12:12:54] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: TFLiteNMS_TRT. Attempting to import as plugin.
[03/08/2022-12:12:54] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: TFLiteNMS_TRT, plugin_version: 1, plugin_namespace:
[03/08/2022-12:12:54] [I] [TRT] builtin_op_importers.cpp:3676: Successfully created plugin: TFLiteNMS_TRT
[03/08/2022-12:12:54]
[W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[F] [TRT] Assertion failed: inputs[1].nbDims == 2 || (inputs[1].nbDims == 3 && inputs[1].d[2] == 1)
/home/joao/tensorrt-examples/TensorRT/plugin/tfliteNMSPlugin/tfliteNMSPlugin.cpp:75
Aborting...
Aborted (core dumped)
The model was generated from the Add_TFLiteNMS_Plugin notebook on a host PC.
With that, I'm still not able to reproduce your project. Do you have any hints on how to solve this?
Thank you again!
I was able to build TensorRT after the changes, but I think it is still building the wrong TRT version (8.2.0).
I am sorry. The source that is compatible with JetPack 4.6 or later (TensorRT 8) has been uploaded. (Updating to Tensor RT8 and it works fine with JetPack 4.6.)
I think the original JetPack 4.5.1 (TensorRT7) needs to be built with a commit of b6618342c9881460626e140e603bf3ca12803082. Since my environment is JetPack 4.6, I will downgrade to Jetpack 4.5.1 and check it.
cd tensorrt-examples/TensorRT
git checkout b6618342c9881460626e140e603bf3ca12803082
...
For Jetpack 4.5.1, please build according to the following procedure. Check out by specifying the revision in the tensorrt-examples/TensorRT repository.
git clone https://github.com/NobuoTsukamoto/tensorrt-examples
cd tensorrt-examples/
git submodule update --init --recursive
export TRT_LIBPATH=`pwd`/TensorRT
export PATH=${PATH}:/usr/local/cuda/bin
cd $TRT_LIBPATH
# Note: For Jetson 4.5.1, The ONNX revision also needs to be changed with the submodule update.
git checkout b6618342c9881460626e140e603bf3ca12803082
git submodule update
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=10.2 -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/usr/bin/gcc
make -j3
sudo cp out/libnvinfer_plugin.so.7.2.3 /usr/lib/aarch64-linux-gnu/
sudo rm /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
sudo ln -s /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.2.3 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
The result of trtexec.
/usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/ssdlite_mobilenet_v2_300x300_gs.onnx
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/ssdlite_mobilenet_v2_300x300_gs.onnx
[03/09/2022-18:20:08] [I] === Model Options ===
[03/09/2022-18:20:08] [I] Format: ONNX
[03/09/2022-18:20:08] [I] Model: /home/jetson/ssdlite_mobilenet_v2_300x300_gs.onnx
[03/09/2022-18:20:08] [I] Output:
[03/09/2022-18:20:08] [I] === Build Options ===
[03/09/2022-18:20:08] [I] Max batch: 1
[03/09/2022-18:20:08] [I] Workspace: 16 MB
[03/09/2022-18:20:08] [I] minTiming: 1
[03/09/2022-18:20:08] [I] avgTiming: 8
[03/09/2022-18:20:08] [I] Precision: FP32
[03/09/2022-18:20:08] [I] Calibration:
[03/09/2022-18:20:08] [I] Safe mode: Disabled
[03/09/2022-18:20:08] [I] Save engine:
[03/09/2022-18:20:08] [I] Load engine:
[03/09/2022-18:20:08] [I] Builder Cache: Enabled
[03/09/2022-18:20:08] [I] NVTX verbosity: 0
[03/09/2022-18:20:08] [I] Inputs format: fp32:CHW
[03/09/2022-18:20:08] [I] Outputs format: fp32:CHW
[03/09/2022-18:20:08] [I] Input build shapes: model
[03/09/2022-18:20:08] [I] Input calibration shapes: model
[03/09/2022-18:20:08] [I] === System Options ===
[03/09/2022-18:20:08] [I] Device: 0
[03/09/2022-18:20:08] [I] DLACore:
[03/09/2022-18:20:08] [I] Plugins:
[03/09/2022-18:20:08] [I] === Inference Options ===
[03/09/2022-18:20:08] [I] Batch: 1
[03/09/2022-18:20:08] [I] Input inference shapes: model
[03/09/2022-18:20:08] [I] Iterations: 10
[03/09/2022-18:20:08] [I] Duration: 3s (+ 200ms warm up)
[03/09/2022-18:20:08] [I] Sleep time: 0ms
[03/09/2022-18:20:08] [I] Streams: 1
[03/09/2022-18:20:08] [I] ExposeDMA: Disabled
[03/09/2022-18:20:08] [I] Spin-wait: Disabled
[03/09/2022-18:20:08] [I] Multithreading: Disabled
[03/09/2022-18:20:08] [I] CUDA Graph: Disabled
[03/09/2022-18:20:08] [I] Skip inference: Disabled
[03/09/2022-18:20:08] [I] Inputs:
[03/09/2022-18:20:08] [I] === Reporting Options ===
[03/09/2022-18:20:08] [I] Verbose: Disabled
[03/09/2022-18:20:08] [I] Averages: 10 inferences
[03/09/2022-18:20:08] [I] Percentile: 99
[03/09/2022-18:20:08] [I] Dump output: Disabled
[03/09/2022-18:20:08] [I] Profile: Disabled
[03/09/2022-18:20:08] [I] Export timing to JSON file:
[03/09/2022-18:20:08] [I] Export output to JSON file:
[03/09/2022-18:20:08] [I] Export profile to JSON file:
[03/09/2022-18:20:08] [I]
----------------------------------------------------------------
Input filename: /home/jetson/ssdlite_mobilenet_v2_300x300_gs.onnx
ONNX IR version: 0.0.8
Opset version: 11
Producer name:
Producer version:
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[03/09/2022-18:20:10] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/09/2022-18:20:10] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: TFLiteNMS_TRT. Attempting to import as plugin.
[03/09/2022-18:20:10] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: TFLiteNMS_TRT, plugin_version: 1, plugin_namespace:
[03/09/2022-18:20:10] [I] [TRT] builtin_op_importers.cpp:3676: Successfully created plugin: TFLiteNMS_TRT
[03/09/2022-18:21:59] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[03/09/2022-18:23:20] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[03/09/2022-18:23:20] [I] Starting inference threads
[03/09/2022-18:23:23] [I] Warmup completed 6 queries over 200 ms
[03/09/2022-18:23:23] [I] Timing trace has 87 queries over 3.08086 s
[03/09/2022-18:23:23] [I] Trace averages of 10 runs:
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.3973 ms - Host latency: 35.5135 ms (end to end 35.5265 ms, enqueue 5.0635 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.1875 ms - Host latency: 35.3037 ms (end to end 35.3164 ms, enqueue 4.99312 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.3314 ms - Host latency: 35.4483 ms (end to end 35.4614 ms, enqueue 5.04764 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.2505 ms - Host latency: 35.3669 ms (end to end 35.3799 ms, enqueue 5.15684 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.2581 ms - Host latency: 35.3744 ms (end to end 35.3873 ms, enqueue 5.13035 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.283 ms - Host latency: 35.3995 ms (end to end 35.4123 ms, enqueue 5.1054 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.2608 ms - Host latency: 35.3769 ms (end to end 35.3898 ms, enqueue 5.0103 ms)
[03/09/2022-18:23:23] [I] Average on 10 runs - GPU latency: 35.3403 ms - Host latency: 35.4581 ms (end to end 35.4708 ms, enqueue 5.02485 ms)
[03/09/2022-18:23:23] [I] Host Latency
[03/09/2022-18:23:23] [I] min: 35.1873 ms (end to end 35.1951 ms)
[03/09/2022-18:23:23] [I] max: 37.6387 ms (end to end 37.6518 ms)
[03/09/2022-18:23:23] [I] mean: 35.3987 ms (end to end 35.4116 ms)
[03/09/2022-18:23:23] [I] median: 35.3347 ms (end to end 35.3474 ms)
[03/09/2022-18:23:23] [I] percentile: 37.6387 ms at 99% (end to end 37.6518 ms at 99%)
[03/09/2022-18:23:23] [I] throughput: 28.2389 qps
[03/09/2022-18:23:23] [I] walltime: 3.08086 s
[03/09/2022-18:23:23] [I] Enqueue Time
[03/09/2022-18:23:23] [I] min: 4.72998 ms
[03/09/2022-18:23:23] [I] max: 6.03662 ms
[03/09/2022-18:23:23] [I] median: 4.99207 ms
[03/09/2022-18:23:23] [I] GPU Compute
[03/09/2022-18:23:23] [I] min: 35.0728 ms
[03/09/2022-18:23:23] [I] max: 37.5224 ms
[03/09/2022-18:23:23] [I] mean: 35.2822 ms
[03/09/2022-18:23:23] [I] median: 35.2192 ms
[03/09/2022-18:23:23] [I] percentile: 37.5224 ms at 99%
[03/09/2022-18:23:23] [I] total compute time: 3.06956 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/ssdlite_mobilenet_v2_300x300_gs.onnx
Nice! I followed your last instructions for Jetpack 4.5.1 and trtexec worked, getting the same console output as yours!
Then, the pycuda installation and model conversion also worked with no issues with the commands:
sudo apt install python3-dev
pip3 install --global-option=build_ext --global-option="-I/usr/local/cuda/include" --global-option="-L/usr/local/cuda/lib64" pycuda
cd ~/tensorrt-examples/python/detection/
python3 convert_onnxgs2trt.py \
--model /home/jetson/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_gs.onnx \
--output /home/jetson/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_fp16.trt \
--fp16
Finally, I tried running inference with ssdlite_mobilenet_v2_300x300_fp16.trt model:
python3 /home/joao/tensorrt-examples/python/detection/trt_detection.py \
--model /home/joao/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_fp16.trt \
--label /home/joao/tensorrt-examples/models/coco_labels.txt \
--width 300 \
--height 300
But it returned me the following error:
Traceback (most recent call last):
File "/home/joao/tensorrt-examples/python/detection/trt_detection.py", line 207, in <module>
main()
File "/home/joao/tensorrt-examples/python/detection/trt_detection.py", line 159, in main
boxs = trt_outputs[1].reshape([int(trt_outputs[0]), 4])
ValueError: cannot reshape array of size 40 into shape (7,4)
Which is caused by line:
boxs = trt_outputs[1].reshape([int(trt_outputs[0]), 4])
For some reason, the int(trt_outputs[0]) is not resulting on the right value. I tried hard coding it to '10', as to result on the right shape, based on the given error:
boxs = trt_outputs[1].reshape([10, 4])
With that replacement the code works fine! But I'm wondering the cause of this error, would you any have clue on how to solve it?
Again, thanks so much for the assistance!
I also kept the code running and printed trt_outputs[0] values for some time, this was the result:
10
10
10
10
9
10
10
10
10
10
10
10
10
10
10
9
9
8
10
9
8
9
6
9
6
8
8
7
6
8
7
9
7
9
7
8
9
8
10
10
9
10
9
10
10
10
9
10
10
9
9
9
8
9
8
9
5
5
5
4
5
5
3
6
5
6
8
6
5
6
7
6
7
8
8
8
9
8
6
6
6
7
8
10
5
7
7
6
5
5
6
7
6
5
6
5
5
5
7
I still don't really understand what this variable should mean, but it seems like it starts with the right value and then decays after some time.
Sorry. This is a problem with trt_detection.py.
Would you please change the code and check it?
After:
boxs = trt_outputs[1].reshape([-1, 4])
for index in range(int(trt_outputs[0])):
box = boxs[index]
trt_outputs [0] (num_detections) indicates the number of detections, from 0 to 10. (Maximum is 10 by default) Therefore, the number of detections varies depending on the input image.
Checked here and that solves the problem!
With that I think the issue is totally solved.
I am still having some problems when trying to convert ssd_mobilenet_v1_fpn_coco from TensorFlow 1 Model Zoo using convert_onnxgs2trt.py. It gives some Assertion Errors during onnx parsing from line:
For now I have commented/changed some ASSERTION's on plugin/tfliteNMSPlugin/tfliteNMSPlugin.cpp code and did the conversion, but the model generated is too slow (~350ms inference time) and could not detect any objects, but I think it looks more like a new problem. I will be opening another issue for this with better details.
Thank you for the support!
Hi,
I'm trying to reproduce the exact same steps from your Object Detection tutorial on Jetson Nano.
I've started from a fresh Jetpack 4.5.1 installation and executed the commands:
Until now, everything works fine. But when I execute the cmake command:
It tells me TensorRT is being built for version 8.2.0:
Just to double check I ran a "dpkg -l" command and saw that my TensorRT version is actually 7.1.3. Even so, the cmake compiles with no errors and the build files get written succesfully. So I tried running make:
And got the following error:
Would you have any other specific instructions in order to reproduce your object detection examples? I need to run tensorrt optimized object detection inference from tensorflow models and this has been the best guide I have found so far.
Thank you in advance!