AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Xaviar does not process in real time? How do I improve it? #7150

Open PROGRAMMINGENGINEER-NIKI opened 3 years ago

PROGRAMMINGENGINEER-NIKI commented 3 years ago

I am running YOLO v3 on Jetson Xavier, I am using python and OpenCV DNN module for inferencing, and it is pretty slow and seems as if it is not effective for my application. So I am looking for another detection model that can run in a real-time manner on Xaviar. It has to have reasonable accuracy, so I do not consider running YOLO-tiny. Any suggestion Please? What model is capable of running real-time in a Xaviar device? I would appreciate any comments and suggestions.

AlexeyAB commented 3 years ago

YOLOv4 (416x416) can be run 41 FPS on AGX Xavier and 22 FPS on NX Xavier by using TensorRT+tkDNN with FP16, Batch=1 https://github.com/ceccocats/tkDNN#fps-results

And maybe slightly slower by using OpenCV.

PROGRAMMINGENGINEER-NIKI commented 3 years ago

Hi @AlexeyAB ,

Thank you for the information. I am using AGX Xavier. I have following questions:

1) I use python and OpenCV DNN module for inferencing. Does Opencv-DNN support GPU base YOLO V4? If I switch to YOLO V4, could I compile it using the DNN module? and would it improve the performance if I use TensorRT+tkDNN? considering that I am using the DNN version of YOLO v4.

2) I heard that using onnx also improves the speed, I am a bit confused. Could you please tell me which one would give me the highest speed with reasonable accuracy, the TensorRT+tkDNN or onnx version?

Thanks

AlexeyAB commented 3 years ago
  1. What FPS do you get?

  2. You must compile OpenCV with CUDA and cuDNN

  3. In your python code you must use

    net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
    net.setPreferableBackend(cv.dnn.DNN_BACKEND_CUDA)
    net.setPreferableTarget(cv.dnn.DNN_TARGET_CUDA_FP16)
  4. The fastest TensorRT+tkDNN or TensorRT+Deepstream or OpenCV-dnn.

  5. ONNX is faster than Darknet but slower than TensorRT/OpenCV.

PROGRAMMINGENGINEER-NIKI commented 3 years ago

I compiled OpenCV with CUDA and cuDNN. I also use those three lines of python code you mentioned "DNN_BACKEND_CUDA". The FPS is about 280ms seconds per image on my AGX Xavier. It is very slow.

My goal is to detect and count objects using Xaviar, so far I used YOLO v3 (OpenCV DNN version)+ Sort tracker(not deep sort), I get 30 FPS on my RTX2080, but very poor performance on the Xaviar around 280ms. So I am wondering what method would give me a real-time result? any advice you could give would be much appreciated.

AlexeyAB commented 3 years ago

The FPS is about 280ms seconds per image on my AGX Xavier. It is very slow.

It should be ~30ms. It looks like you are doing something wrong. Or some of your recent code is wrong.

PROGRAMMINGENGINEER-NIKI commented 3 years ago

The FPS is about 280ms seconds per image on my AGX Xavier. It is very slow.

It should be ~30ms. It looks like you are doing something wrong. Or some of your recent code is wrong.

I recently heard of TensorRT and tkDNN, I have not tried any TensorRT+tkDNN optimization yet. I just configured OpenCV in a way that DNN module can compile with CUDA and cuDNN.

I am just wondering if I understand you well, do you mean YOLOv3 implemented using DNN-OpenCV can reach ~30ms, without any TensorRT optimization on Xavier AGX? So I might be missing some packages? or should I optimize the network to reach that FPS(~30ms)? if so what sort of optimization would be the best bet?

Thanks,

AlexeyAB commented 3 years ago

I am just wondering if I understand you well, do you mean YOLOv3 implemented using DNN-OpenCV can reach ~30ms, without any TensorRT optimization on Xavier AGX? So I might be missing some packages?

yes

stephanecharette commented 3 years ago

On my Xavier NX, when I run YOLOv4-tiny at 608x448, I'm getting 20.301 milliseconds / image, or 49.259 FPS. Note the first image takes 1170 milliseconds, then it goes down to 31 milliseconds, and after 4 images I'm getting 20 milliseconds. I assume this is because it takes a few milliseconds for the GPU to reach top speed.

This is using darknet's C API via DarkHelp's C++ wrapper. No onnx, tensorrt, or other sort of optimization.

stephanecharette commented 3 years ago

In case it helps, on my NX these are the only changes I have to darknet itself:

Makefile:

Then of course I also run sudo jetson_clocks and sudo nvpmodel --mode 2.