Closed jgocm closed 2 years ago
Have you been able to use the NMS Plugin for any other networks from Tensorflow 1 Model Zoo than SSDLite MobileNetv2? If yes, how did you do it?
ssd_mobilenet_v2_300x300
and ssdlite_mobiledet_gpu_320x320
(I created this plugin for the purpose of running ssdlite_mobiledet_gpu with jetson.)
If not, do you know what might be causing this input dimensions errors?
For nfp model, the input of TFLite_Detection_PostProcess is different.
ssd_mobilenet_v2_300x300:
ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco
Therefore, if you delete ASSERT of tfliteNMSPlugin.cpp, I think that it will work for the time being.
It seems that it cannot be dealt with only by deleting ASSERT. Checking details.
Fixed a problem with the fpn model and committed.
However, Jetpack 4.5.1 couldn't convert the fpn model and couldn't confirm it.
ssd_mobilenet_v1_fpn_shared_box_predictor_640x640
could not be converted due to lack of memory on Jetson Nano.ssd_mobilenet_v2_mnasfpn_shared_box_predictor_320x320
will result in an error at other layers.In TensorRT 8.2 (JetPack4.6), the problem of ssd_mobilenet_v2_mnasfpn_shared_box_predictor_320x320 was solved and conversion was possible. For this reason, we plan to proceed with confirmation with JetPack 4.6.1. I will share it when I can confirm it.
I have confirmed that ssd_mobilenet_v2_mnasfpn_shared_box_predictor_320x320 works with Jetpack 4.6.1. Thanks you very much for reporting the problem.
Build plugin.
# Jetson Nano
git clone https://github.com/NobuoTsukamoto/tensorrt-examples
cd ./tensorrt-examples/python/detection
git submodule update --init --recursive
export TRT_LIBPATH=`pwd`/TensorRT
export PATH=${PATH}:/usr/local/cuda/bin
cd $TRT_LIBPATH
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=10.2 -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/usr/bin/gcc
make -j$(nproc)
sudo cp out/libnvinfer_plugin.so.8.2.0 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.2.0
sudo rm /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8
sudo ln -s /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.2.0 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8
ls -al /usr/lib/aarch64-linux-gnu/libnvinfer_plugin*
lrwxrwxrwx 1 root root 49 Mar 18 20:47 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so -> /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8
lrwxrwxrwx 1 root root 53 Mar 8 18:29 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8 -> /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.2.0
-rw-r--r-- 1 root root 18280576 Jun 26 2021 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.0.1
-rwxr-xr-x 1 root root 41492816 Mar 18 21:12 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.2.0
-rw-r--r-- 1 root root 21018654 Jun 26 2021 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin_static.a
check trtexec (Conversions and trtexecs must be done in CUI mode.)
# Jetson Nano
sudo systemctl set-default multi-user.target
sudo reboot
/usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/tensorrt-examples/models/ssd_mobilenet_v2_mnasfpn_shared_box_predi ctor_320x320_coco_gs.onnx --workspace=2048
...
[03/19/2022-12:44:00] [I] Starting inference
[03/19/2022-12:44:03] [I] Warmup completed 3 queries over 200 ms
[03/19/2022-12:44:03] [I] Timing trace has 39 queries over 3.12049 s
[03/19/2022-12:44:03] [I]
[03/19/2022-12:44:03] [I] === Trace details ===
[03/19/2022-12:44:03] [I] Trace averages of 10 runs:
[03/19/2022-12:44:03] [I] Average on 10 runs - GPU latency: 79.8587 ms - Host latency: 79.9861 ms (end to end 79.9958 ms, enqueue 12.687 ms)
[03/19/2022-12:44:03] [I] Average on 10 runs - GPU latency: 79.8463 ms - Host latency: 79.9742 ms (end to end 79.9842 ms, enqueue 12.0583 ms)
[03/19/2022-12:44:03] [I] Average on 10 runs - GPU latency: 79.847 ms - Host latency: 79.9747 ms (end to end 79.9847 ms, enqueue 12.6927 ms)
[03/19/2022-12:44:03] [I]
[03/19/2022-12:44:03] [I] === Performance summary ===
[03/19/2022-12:44:03] [I] Throughput: 12.498 qps
[03/19/2022-12:44:03] [I] Latency: min = 79.7407 ms, max = 80.3516 ms, mean = 80.0022 ms, median = 79.9524 ms, percentile(99%) = 80.3516 ms
[03/19/2022-12:44:03] [I] End-to-End Host Latency: min = 79.75 ms, max = 80.3616 ms, mean = 80.012 ms, median = 79.9622 ms, percentile(99%) = 80.3616 ms
[03/19/2022-12:44:03] [I] Enqueue Time: min = 9.36206 ms, max = 12.8073 ms, mean = 12.3673 ms, median = 12.7076 ms, percentile(99%) = 12.8073 ms
[03/19/2022-12:44:03] [I] H2D Latency: min = 0.119629 ms, max = 0.123169 ms, mean = 0.12132 ms, median = 0.121338 ms, percentile(99%) = 0.123169 ms
[03/19/2022-12:44:03] [I] GPU Compute Time: min = 79.6128 ms, max = 80.2219 ms, mean = 79.8744 ms, median = 79.8252 ms, percentile(99%) = 80.2219 ms
[03/19/2022-12:44:03] [I] D2H Latency: min = 0.00488281 ms, max = 0.00708008 ms, mean = 0.00644469 ms, median = 0.00640869 ms, percentile(99%) = 0.00708008 ms
[03/19/2022-12:44:03] [I] Total Host Walltime: 3.12049 s
[03/19/2022-12:44:03] [I] Total GPU Compute Time: 3.1151 s
[03/19/2022-12:44:03] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/19/2022-12:44:03] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/tensorrt-examples/models/ssd_mobilenet_v2_mnasfpn_shared_box_predictor_320x320_coco_gs.onnx --workspace=2048
[03/19/2022-12:44:03] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 879, GPU 3078 (MiB)
run demo ( I created fp16 model )
# Jetson Nano
sudo systemctl set-default graphical.target
sudo reboot
python3 trt_detection.py --model ../../models/ssd_mobilenet_v2_mnasfpn_shared_box_predictor_320x320_coco_fp16.trt --label ../../models/coco_labels.txt --width 320 --height 320 --videopath __PATH_TO_VIDEOFILE
I have built the project on Jetpack 4.5.1 following the instructions from: https://github.com/NobuoTsukamoto/tensorrt-examples/issues/3#issuecomment-1062732055tensor
Then, created ONNX GraphSurgeon for ssd_mobilenet_v1_fpn_coco using the Add_TFLiteNMS_Plugin notebook, by replacing "ssdlite_mobilenet_v2_coco_2018_05_09" model with "ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03" and input shapes from "input_shapes=1,300,300,3" to "input_shapes=1,640,640,3". Here is my TFLiteNMS_Plugin notebook.
When trying to convert the model from onnx to trt on Jetson Nano with convert_onnxgs2trt.py, got the following error:
Looking deeper into tfliteNMSPlugin.cpp, this assertion comes from TFLiteNMSPlugin::getOutputDimensions, at line 74. Just to check, I tried printing the number of dimensions of the inputs from the input vector:
The result was:
So my questions are:
Here is my network ONNX GraphSurgeon file.
EDIT: Sharing also an image of the network generated from Netron.