Open PROGRAMMINGENGINEER-NIKI opened 3 years ago
YOLOv4 (416x416) can be run 41 FPS on AGX Xavier and 22 FPS on NX Xavier by using TensorRT+tkDNN with FP16, Batch=1 https://github.com/ceccocats/tkDNN#fps-results
And maybe slightly slower by using OpenCV.
Hi @AlexeyAB ,
Thank you for the information. I am using AGX Xavier. I have following questions:
1) I use python and OpenCV DNN module for inferencing. Does Opencv-DNN support GPU base YOLO V4? If I switch to YOLO V4, could I compile it using the DNN module? and would it improve the performance if I use TensorRT+tkDNN? considering that I am using the DNN version of YOLO v4.
2) I heard that using onnx also improves the speed, I am a bit confused. Could you please tell me which one would give me the highest speed with reasonable accuracy, the TensorRT+tkDNN or onnx version?
Thanks
What FPS do you get?
You must compile OpenCV with CUDA and cuDNN
In your python code you must use
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA_FP16)
The fastest TensorRT+tkDNN or TensorRT+Deepstream or OpenCV-dnn.
ONNX is faster than Darknet but slower than TensorRT/OpenCV.
I compiled OpenCV with CUDA and cuDNN. I also use those three lines of python code you mentioned "DNN_BACKEND_CUDA". The FPS is about 280ms seconds per image on my AGX Xavier. It is very slow.
My goal is to detect and count objects using Xaviar, so far I used YOLO v3 (OpenCV DNN version)+ Sort tracker(not deep sort), I get 30 FPS on my RTX2080, but very poor performance on the Xaviar around 280ms. So I am wondering what method would give me a real-time result? any advice you could give would be much appreciated.
The FPS is about 280ms seconds per image on my AGX Xavier. It is very slow.
It should be ~30ms. It looks like you are doing something wrong. Or some of your recent code is wrong.
The FPS is about 280ms seconds per image on my AGX Xavier. It is very slow.
It should be ~30ms. It looks like you are doing something wrong. Or some of your recent code is wrong.
I recently heard of TensorRT and tkDNN, I have not tried any TensorRT+tkDNN optimization yet. I just configured OpenCV in a way that DNN module can compile with CUDA and cuDNN.
I am just wondering if I understand you well, do you mean YOLOv3 implemented using DNN-OpenCV can reach ~30ms, without any TensorRT optimization on Xavier AGX? So I might be missing some packages? or should I optimize the network to reach that FPS(~30ms)? if so what sort of optimization would be the best bet?
Thanks,
I am just wondering if I understand you well, do you mean YOLOv3 implemented using DNN-OpenCV can reach ~30ms, without any TensorRT optimization on Xavier AGX? So I might be missing some packages?
yes
On my Xavier NX, when I run YOLOv4-tiny at 608x448, I'm getting 20.301 milliseconds / image, or 49.259 FPS. Note the first image takes 1170 milliseconds, then it goes down to 31 milliseconds, and after 4 images I'm getting 20 milliseconds. I assume this is because it takes a few milliseconds for the GPU to reach top speed.
This is using darknet's C API via DarkHelp's C++ wrapper. No onnx, tensorrt, or other sort of optimization.
In case it helps, on my NX these are the only changes I have to darknet itself:
Makefile:
GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1
OPENMP=1
LIBSO=1
ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
Then of course I also run sudo jetson_clocks
and sudo nvpmodel --mode 2
.
I am running YOLO v3 on Jetson Xavier, I am using python and OpenCV DNN module for inferencing, and it is pretty slow and seems as if it is not effective for my application. So I am looking for another detection model that can run in a real-time manner on Xaviar. It has to have reasonable accuracy, so I do not consider running YOLO-tiny. Any suggestion Please? What model is capable of running real-time in a Xaviar device? I would appreciate any comments and suggestions.