AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.76k stars 7.96k forks source link

472.4% cpu usage of yolov3-tiny on GTX 1050 GPU. #2710

Open jamessmith90 opened 5 years ago

jamessmith90 commented 5 years ago

Is it possible to put the entire load of yolov3-tiny model in GPU memory ??? It seems to be taking quite a lot of cpu on my i7 laptop with GTX 1050.

Statistics: 80 FPS

top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12293 root 20 0 16.568g 1.370g 237520 R 472.4 17.8 2:02.42 darknet

nvidia-smi 0 9809 C ./darknet 399MiB

GPU Load: 0, 2019/03/23 14:53:59.214, 72 %, [Not Supported], 53 0, 2019/03/23 14:54:00.216, 77 %, [Not Supported], 54 0, 2019/03/23 14:54:01.217, 75 %, [Not Supported], 54 0, 2019/03/23 14:54:02.218, 75 %, [Not Supported], 55 0, 2019/03/23 14:54:03.218, 75 %, [Not Supported], 55 0, 2019/03/23 14:54:04.219, 72 %, [Not Supported], 55 0, 2019/03/23 14:54:05.219, 76 %, [Not Supported], 56 0, 2019/03/23 14:54:06.219, 76 %, [Not Supported], 56 0, 2019/03/23 14:54:07.220, 76 %, [Not Supported], 56 0, 2019/03/23 14:54:08.221, 77 %, [Not Supported], 57 0, 2019/03/23 14:54:09.221, 73 %, [Not Supported], 57 0, 2019/03/23 14:54:10.221, 74 %, [Not Supported], 57

AlexeyAB commented 5 years ago
jamessmith90 commented 5 years ago

Darknet detector demo Except cudnn_half rest all are enabled 1980x1080

AlexeyAB commented 5 years ago

Statistics: 80 FPS

top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12293 root 20 0 16.568g 1.370g 237520 R 472.4 17.8 2:02.42 darknet

nvidia-smi 0 9809 C ./darknet 399MiB

GPU Load: 0, 2019/03/23 14:53:59.214, 72 %, [Not Supported], 53

80 FPS with 72% GPU usage and 470% CPU usage is normal for 1920x1080 FullHD video file.

What FPS, GPU and CPU usage can you get with the same: cfg, weights and Video file (1920x1080) by using original repository? https://github.com/pjreddie/darknet

jamessmith90 commented 5 years ago

@AlexeyAB I am unable to run the code on the original repo. There is some names file issue.

High cpu usage makes it unusable if I want to handle 50 live streams. It's really sad as it would force me to use tensorflow.

AlexeyAB commented 5 years ago

@jamessmith90

Do you want to run 50 live streams on GPU GTX 1050? There will be less than 2 FPS for each stream.

Yes, you can use Yolov3 on TensorFlow: https://github.com/AlexeyAB/darknet/issues/2707#issuecomment-475722268

jamessmith90 commented 5 years ago

50 live streams on 4 GPU can be managed easily but 100-200 cores is out of line.

AlexeyAB commented 5 years ago

Will you get 80 FPS for each of 50 live streams on 4 x GPU GTX 1050?

CPU usage is proportional to GPU usage if there is used the same cfg/weights and video-resolution, since both CPU & GPU usage are proportional to FPS. So for 4 x GPU GTX 1050 (75% usage for each) you will occupy only 20 CPU Logical Cores (very approximately) https://ark.intel.com/content/www/us/en/ark/products/189127/intel-core-i9-9920x-x-series-processor-19-25m-cache-up-to-4-50-ghz.html since you will process 50 live streams with ~6 FPS for each by using such system.


If you want reduce CPU-usage, you should do video pre-processing (video decompressing & resizing) on GPU - something like this: https://github.com/AlexeyAB/yolo2_light/issues/25#issuecomment-435468000

jamessmith90 commented 5 years ago

20 CPU cores will be Xeon which everyone knows are made for power efficiency and are twice as slow as i7 8th generation. This doesn't make any sense when the darknet is compiled to use GPU. It should use GPU and not CPU so much.

Video decoder is not using the bulk of the CPU as i have already used OpenCV libraries in Tensorflow also. This might be a generic problem with this repo or with yolo as a framework.

AlexeyAB commented 5 years ago

20 CPU cores will be Xeon which everyone knows are made for power efficiency and are twice as slow as i7 8th generation.

Why do you want to use some slow 20 Logical Cores Xeon instead of 24 Logical Cores Core i9-9920X? https://ark.intel.com/content/www/us/en/ark/products/189127/intel-core-i9-9920x-x-series-processor-19-25m-cache-up-to-4-50-ghz.html


This doesn't make any sense when the darknet is compiled to use GPU. It should use GPU and not CPU so much.

Yolo neural network mostly uses only GPU (not CPU). If Video decoding and frame resizing will use GPU, then the less free resources of GPU will remain for the neural network Yolo, so you will get lower FPS. Usually I have 30-50% of total CPU usage and 85-90% of GPU usage, so bottleneck is on GPU (CUDA-functions + overhead of starting CUDA-functions + data transfer).


This might be a generic problem with this repo or with yolo as a framework.

You are just not able to run the original repository https://github.com/pjreddie/darknet Otherwise, you will get 1.5 - 2x times less FPS on the same cfg/weights and FullHD video-file, so CPU usage will also be 1.5 - 2x times less.


Video decoder is not using the bulk of the CPU as i have already used OpenCV libraries in Tensorflow also.

It looks like you used very slow neural networks with much less FPS than 80 on High-resolution video, therefore, it did not require a large amount of preprocessing work. If you know how to use the profiler, then try running it. You can easily see in the profiler that the most amount of resources 75% CPU usage (31.5+21.5+22.7) are consumed by the OpenCV library:

darknet detector demo cfg/coco.data yolov3-tiny.cfg yolov3-tiny.weights street_high_resolution.mp4

ocv1

ocv2

ocv3

jamessmith90 commented 5 years ago

Let me verify your analysis and i will get back.

dexception commented 5 years ago

@jamessmith90 Use TensorRT ! Not only you will get 1.5x performance over yolo2_light with int8 but you will also be able reduce your CPU usage.