Low FPS inference Jetson TX1

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.78k stars 7.96k forks source link

Low FPS inference Jetson TX1 #7404

Open b21627193 opened 3 years ago

b21627193 commented 3 years ago

Hi,

AVG_FPS is 1.3 while inferencing video in my Jetson TX1 ( even I use tiny weights). Is there a way to increase performance ? What is the maximum FPS that I can get from TX1 ?

Thanks in advance.

AlexeyAB commented 3 years ago

Did you compile Darknet with CUDA and cuDNN?

Show screenshots with such information

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
 CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
 CUDNN_HALF=1
 OpenCV version: 4.2.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF

shubham-shahh commented 3 years ago

Hi,

AVG_FPS is 1.3 while inferencing video in my Jetson TX1 ( even I use tiny weights). Is there a way to increase performance ? What is the maximum FPS that I can get from TX1 ?

Thanks in advance.

If performance is something you are looking for, you should use DeepStream or tkdnn

b21627193 commented 3 years ago

Here it is. HeightxWeight = 128x128 in cfg and it gives 9.8 FPS as average.

./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights -ext_output /home/nvidia/Desktop/traffic.mp4 -out_filename out7.avi
 CUDA-version: 10020 (10020), cuDNN: 8.0.0, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 4.1.1
Demo
 0 : compute_capability = 530, cudnn_half = 0, GPU: NVIDIA Tegra X1 
net.optimized_memory = 0 
mini_batch = 1, batch = 8, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    128 x 128 x   3 ->  128 x 128 x  32 0.028 BF

What are the ways to increase performance? When I try tiny-weights I get 25FPS(256x256). Is it OK for Jetson TX1 ?

Thank you for your time.