15 fps on both yolov3 and tiny yolov3. Why?

MWaleedK commented 2 years ago

On both the devices, the cfg files run with height and width set to 416x416. Furthermore, I'm simply running the COCO dataset. That should be running fast on the tiny if not on the full blown yolov3 neural net. I understand that a 1060 is no way near good as a Titan X but the tiny version should at least hit the 25 frame mark considering the fact that it uses only 7 compared to the 53 actual conv layers (which show up when I run bot cfgs).

So, I've been trying to get yolov3 to process video feed at a faster rate than 15 fps. Aside from that, tiny yolov3 also runs at 15. Primary setup: My camera can go up to 25 frames per second, it's a standard laptop camera. OS: Windows 10 CUDA: 11.4 CUDNN: 8.2.4 OpenCV: 4.5.0 CPU: i7-8750H (can turbo upto 4.1GHz) GPU: Nvidia GTX 1060 6GB

Here is my CMAKE dump: option(CMAKE_VERBOSE_MAKEFILE "Create verbose makefile" ON) option(CUDA_VERBOSE_BUILD "Create verbose CUDA build" ON) option(BUILD_SHARED_LIBS "Create dark as a shared library" ON) option(BUILD_AS_CPP "Build Darknet using C++ compiler also for C files" OFF) option(BUILD_USELIB_TRACK "Build uselib_track" ON) option(MANUALLY_EXPORT_TRACK_OPTFLOW "Manually export the TRACK_OPTFLOW=1 define" OFF) option(ENABLE_OPENCV "Enable OpenCV integration" ON) option(ENABLE_CUDA "Enable CUDA support" ON) option(ENABLE_CUDNN "Enable CUDNN" ON) option(ENABLE_CUDNN_HALF "Enable CUDNN Half precision" ON) option(ENABLE_ZED_CAMERA "Enable ZED Camera support" ON) option(ENABLE_VCPKG_INTEGRATION "Enable VCPKG integration" ON) option(ENABLE_CSHARP_WRAPPER "Enable building a csharp wrapper" OFF) option(VCPKG_BUILD_OPENCV_WITH_CUDA "Build OpenCV with CUDA extension integration" ON) option(VCPKG_USE_OPENCV2 "Use legacy OpenCV 2" OFF) option(VCPKG_USE_OPENCV3 "Use legacy OpenCV 3" OFF) option(VCPKG_USE_OPENCV4 "Use OpenCV 4" ON)

I would like some good tutorial on going down the tensorrt path using onnx to speed up my inference. I would be very much obliged. On my secondary setup, I get around 6 fps on the tiny version and 1.4 fps on the normal version by downgrading input video quality to 640x480 pixels @30fps commanded by the gStreamer pipeline. My secondary setup: OS: linux 18.04.5 LTS CUDA: 10.2.89 CUDNN: 8.0.0.180 Jetpack: 4.4.1 OpenCV:4.5.0 Nvidia Jetson Nano 2GB Cmake dump for this one: Same as the above one.

stephanecharette commented 2 years ago

Make sure you read through this: https://www.ccoderun.ca/programming/2021-10-16_darknet_fps/

If you get the same FPS you either made a mistake, or you're running into another limitation. For example, if you are using a USB camera, you may be limited to USB 2, USB 3, etc.

MWaleedK commented 2 years ago

Thank you for the link. It's not the camera, the Jetson has its camera attached to the CSI-mipi port and is supposed to output low frames because i haven't converted it to tensorrt. I doubt these benchmarks were made on Alexey's implementation regarding the Jetson at least. I'm concerned about the windows version. I'd be much grateful for any pointer on how to troubleshoot my problem. Any links would be appreciated. Could it be that some previous CUDA version, say 7 would help me speed things up.

stephanecharette commented 2 years ago

I doubt these benchmarks were made on Alexey's implementation

Care to explain your comment? I did the test, and I wrote that blog entry. I'm the author. I can guarantee you I used AlexeyAB's darknet on my Jetson devices.

MWaleedK commented 2 years ago

Apologies for the comment, I did not mean any disrespect. This was the basis for my conclusion: https://forums.developer.nvidia.com/t/yolov3-is-very-slow/74073

I have been looking into this problem for weeks and moving to tensorrt from darknet seemed to be the only viable solution for better performance on the Jetson. I'm glad to be proven wrong by the numbers you've provided in the blog. I just don't know where to start troubleshooting.

stephanecharette commented 2 years ago

Re-read the comment I wrote above, and re-read the blog entry. The blog entry has everything to replicate the results, no magic was used. I even posted the command line I ran to do the tests.

MWaleedK commented 2 years ago

Yes, I re-read the post. This might be a dumb question to ask but, have you left out NMS in the process?

stephanecharette commented 2 years ago

No, I'm pretty sure darknet always does nms.

MWaleedK commented 2 years ago

Thank you for your help. I honestly can't thank you enough. I have returned after running some tests, for some reason, my video feeds are causing problems, even though I tested my camera to be able to record at ~30 fps, it the yolo video feed won't go above 15 fps. I ran a video file at 416x416 on my windows machine using tiny-yolo and it gave me ~57 fps.! I also did a little experiment and went with CUDNN 7.5 instead of the previous verso: 8.2.4. I gained ~5fps on an average. My numbers are rough 22 fps on yolov3 itself. Could you maybe state what version of CUDNN and CUDA did you use on the Jetson (Also the Jetpack version)?

stephanecharette commented 2 years ago

AGX Xavier:

Jetpack L4T 32.5.2
cuda 10.2.89
opencv 4.5.3 compiled with cuda
cudnn 8.0.0.180

Xavier NX

Jetpack 4.6 L4T 32.6.1
cuda 10.2.300
opencv 4.5.3 compiled with cuda
cudnn 8.2.1.32

Nano

Jetpack 4.6 L4T 32.6.1
cuda 10.2.300
opencv 4.5.3 compiled with cuda
cudnn 8.2.1.32

(All information above obtained using jtop.)

MWaleedK commented 2 years ago

I can't express my gratitude enough. Thank you so much.

AlexeyAB / darknet

15 fps on both yolov3 and tiny yolov3. Why? #8327