AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.64k stars 7.95k forks source link

How to improve fps for YOLO v4-tiny and YOLO v4 #6366

Open jintengli opened 4 years ago

jintengli commented 4 years ago

Hi! I am running YOLO v4 and YOLO v4-tiny under the environment: Win10, CUDA 10.2, cuDNN: 7.6.5, OpenCV: 3.4.4 on RTX 2080 ti. Running with command darknet.exe detector demo cfg/coco.data cfg/yolov4-tiny-custom.cfg yolov4-tiny.weights data/test.mp4 -dont_show -ext_output, I got 64 fps for YOLO v4-tiny and fps around 60 for YOLO v4(changing height and width to 416x416 for YOLO v4). I believe YOLO v4-tiny should run more than 3 times faster than YOLO v4 and I am writing to ask how should I configure to make YOLO v4-tiny run faster? Thanks!

Szamtu commented 4 years ago

I have got around 150 fps for YOLO v4-tiny on gtx 1060 (arch linux, CUDA 11.0, cuDNN 8.0.0, OpenCV 4.4.0).

Meybe you share info about how did you build it? I have built it with Makefile, with flags

GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1
AVX=0
OPENMP=0
LIBSO=0
ZED_CAMERA=0
ZED_CAMERA_v2_8=0

And

# GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4
 ARCH= -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=compute_61

This could give us a clue.

jintengli commented 4 years ago

Thanks Szamtu, I am using the same makefile configuration with different settings for compute capability(since compute capability for RTX 2080 ti is 7.5), what kind of video do you feed in as input? I am using traffic flow video from CCTV as input, maybe it is because I am using low resolution? I will also try to change the environment with high version CUDA and OpenCV. Thanks!

Szamtu commented 4 years ago

I feed 1920x1080 video to my own trained tiny-yolo weights. (following the manual) I have also tried evaluate my gpu following "How to evaluate FPS of YOLOv4 on GPU" from manual.

Results: AVG_FPS:128.0 on yolov4-tiny.cfg and yolov4-tiny.weights on example video with multiple persons, mp4 file, H264 codec, 576x360 resolution

AVG_FPS:123.0 on yolov4-tiny.cfg and yolov4-tiny.weights on example video with multiple persons, mp4 file, H264 High Profile codec, 1920x1080 resolution

Low video resolution should not limit your gpu performance.

@jintengli dod you checked your gpu & cpu utilization? Meybe for some reasons your video decoding botlenecks the gpu?

My threadripper 1920x gets quite high utilization. The gpu runs always at 96-98%. Zrzut ekranu z 2020-07-29 08-01-12

YashasSamaga commented 4 years ago

I will also try to change the environment with high version CUDA and OpenCV

Note that OpenCV 4.2 and above has a CUDA backend in OpenCV DNN. YOLOv4 and YOLOv4-Tiny are supported in OpenCV 4.4.0. You can find OpenCV DNN's performance summary on 2080 Ti and 1080 Ti here.

jintengli commented 4 years ago

I feed 1920x1080 video to my own trained tiny-yolo weights. (following the manual) I have also tried evaluate my gpu following "How to evaluate FPS of YOLOv4 on GPU" from manual.

Results: AVG_FPS:128.0 on yolov4-tiny.cfg and yolov4-tiny.weights on example video with multiple persons, mp4 file, H264 codec, 576x360 resolution

AVG_FPS:123.0 on yolov4-tiny.cfg and yolov4-tiny.weights on example video with multiple persons, mp4 file, H264 High Profile codec, 1920x1080 resolution

Low video resolution should not limit your gpu performance.

@jintengli dod you checked your gpu & cpu utilization? Meybe for some reasons your video decoding botlenecks the gpu?

My threadripper 1920x gets quite high utilization. The gpu runs always at 96-98%. Zrzut ekranu z 2020-07-29 08-01-12

My GPU utilization is really low when running darknet.exe, it is about 7-12%.

Szamtu commented 4 years ago

My GPU utilization is really low when running darknet.exe, it is about 7-12%.

Basicly your GPU is in idle. Try to update your nvidia-driver, opencv & cuda. This might help.

wuzhenxin1989 commented 4 years ago

@jintengli 我用官方的yolov4-tiny.weights 和yolov4-tiny.cfg、 模型参数416*416,在RTX2080TI测试dog.jpg图片的时间约20ms,测试时间很慢。求交流

12343954 commented 3 years ago

@wuzhenxin1989 我 RTX2060,v4 ETA 400ms,更慘,都要懷疑人生了,v3 還能跑個30ms

https://github.com/AlexeyAB/darknet/issues/6630