AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

why darknet's speed in windows is much lower than in linux? #5929

Open hfq0219 opened 4 years ago

hfq0219 commented 4 years ago

i run darknet in ubuntu and windows, with the same hardware, gpu is rtx2080ti. in ubuntu, video detection speed can reach over 35 fps, but in windows, it's only about 25 fps, how?

imaami commented 4 years ago

Can you try to benchmark without a webcam? I've seen OpenCV tank performance because of badly-optimized webcam input handling in another piece of code a few years ago, it might be worthwhile to verify if that could be a factor in your case.

AlexeyAB commented 4 years ago

Run command ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show -ext_output -mjpeg_port 8090

and open URL: http://localhost:8090 in Chrome or Firefox web-browser

Then show screenshot of AVG FPS from console.

And show screenshot with such information

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
 CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
 CUDNN_HALF=1
 OpenCV version: 4.2.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
hfq0219 commented 4 years ago

Can you try to benchmark without a webcam? I've seen OpenCV tank performance because of badly-optimized webcam input handling in another piece of code a few years ago, it might be worthwhile to verify if that could be a factor in your case.

I do not run it with webcam, but only local video file.

hfq0219 commented 4 years ago

Run command ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show -ext_output -mjpeg_port 8090

and open URL: http://localhost:8090 in Chrome or Firefox web-browser

Then show screenshot of AVG FPS from console.

And show screenshot with such information

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
 CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
 CUDNN_HALF=1
 OpenCV version: 4.2.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF

here is the fps: in ubuntu is 46 fps and in windows 38.6 fps fps-l fps-w here is the environment information: net-l net-w

AlexeyAB commented 4 years ago

Try to download the latest version of Darknet and show these 4 screenshots.

hfq0219 commented 4 years ago

Try to download the latest version of Darknet and show these 4 screenshots.

It is the latest version, i only change yolov4.cfg to fit my custom data and then train it to get the weights file.

AlexeyAB commented 4 years ago

It is the latest version

No, the latest version shows GPU Name. Download the latest master branch.

hfq0219 commented 4 years ago

It is the latest version

No, the latest version shows GPU Name. Download the latest master branch.

this is the new result: fps-l fps-w net-l net-w thank you!

AlexeyAB commented 4 years ago

i run darknet in ubuntu and windows, with the same hardware, gpu is rtx2080ti. in ubuntu, video detection speed can reach over 35 fps, but in windows, it's only about 25 fps, how?

But I see 40 - 47 FPS on you screenshots.

hfq0219 commented 4 years ago

i run darknet in ubuntu and windows, with the same hardware, gpu is rtx2080ti. in ubuntu, video detection speed can reach over 35 fps, but in windows, it's only about 25 fps, how?

But I see 40 - 47 FPS on you screenshots.

Because i use the new cpu, the previous is intel i5 6500, and now is intel i5 10500, it's the only difference. But you also can see the fps in windows is lower than linux, maybe 7 to 10 fps?

AlexeyAB commented 4 years ago

Try to run two commands on two OS and show 4 screenshots - so I will understand what is wrong in you case:

  1. ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show
  2. ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
hfq0219 commented 4 years ago

Try to run two commands on two OS and show 4 screenshots - so I will understand what is wrong in you case:

  1. ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show
  2. ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark

Oh sorry, I forget that in my windows the hard-disk is mechanical hard disk, and in linux is solid state disk. Maybe this is the cause? And the screenshots: 1 2 3 1 2 3

AlexeyAB commented 4 years ago

This is very strange, first you say that FPS 25-35, then you show a screenshot from FPS 40-47, and now FPS 32-55.

What is the GPU-usage on both OS, if you run ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark

hfq0219 commented 4 years ago

This is very strange, first you say that FPS 25-35, then you show a screenshot from FPS 40-47, and now FPS 32-55.

What is the GPU-usage on both OS, if you run ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark

Of course it's very strange, but the GPU-usage is low.The time in screenshot is different because of the time zone settings. 1 1

AlexeyAB commented 4 years ago

So the problem is on Windows.

What CPU usage do you see?

What other programs are running in parallel? Perhaps some other program is spending CPU resources.

What compilation way do you use?

Try to change these lines: https://github.com/AlexeyAB/darknet/blob/09991d0488ec49002366013e8e5185941f88b493/src/demo.c#L329-L330 to this this_thread_sleep_for(thread_wait_ms); and recopmile.

hfq0219 commented 4 years ago

So the problem is on Windows.

What CPU usage do you see?

What other programs are running in parallel? Perhaps some other program is spending CPU resources.

What compilation way do you use?

  • MSVS
  • Cmake+MSVS
  • vcpkg

Try to change these lines:

https://github.com/AlexeyAB/darknet/blob/09991d0488ec49002366013e8e5185941f88b493/src/demo.c#L329-L330

to this this_thread_sleep_for(thread_wait_ms); and recopmile.

No other programs are running because i reboot the computer, and i build darknet with darknet.sln in darknet\build\darknet\darknet.sln. After change the lines and recompile, the result: 3

imaami commented 4 years ago

Try to change these lines:

https://github.com/AlexeyAB/darknet/blob/09991d0488ec49002366013e8e5185941f88b493/src/demo.c#L329-L330

to this this_thread_sleep_for(thread_wait_ms); and recopmile.

This is somewhat offtopic, but I noticed you use a spinlock + yield to signal start and completion events between the main thread and fetch_thread() / detect_thread(). User-space spinlocks are generally not a good way to do that, and it's possible that there's a lot of unnecessary contention and CPU usage happening. Also, the schedulers in Windows and Linux behave differently when it comes to yielding, so the same spinlock implementation might not be comparable between the two ports even when built from the same source commit.

Would you consider merging a semaphore-based inter-thread signaling PR if I write one and it has a positive impact on CPU usage?

AlexeyAB commented 4 years ago

@imaami

The scheduler controls only the kernel-space, but not the user-space, so the difference (Win/Linux) in the schedulers much more affects mutexes/semaphores than the user-space-spinlock.

Can you reproduce this issue on your PC? Because I can't reproduce this issue.

Would you consider merging a semaphore-based inter-thread signaling PR if I write one and it has a positive impact on CPU usage?

Yes, if you give enough comparative tests that I can reproduce on my hardware or in the cloud.

AlexeyAB commented 4 years ago

@hfq0219 What Windows version do you use?

hfq0219 commented 4 years ago

@hfq0219 What Windows version do you use?

win10 1909 4

vinhtq115 commented 4 years ago

Bump. I have the same issue. Tried on both HDD and SSD but FPS doesn't change. I'm using YOLOv3. On Ubuntu, I can achieve about 27-28 FPS but on Windows 10 2004, I can only achieve about 21 FPS. Also, I noticed that GPU Usage is only around 65-75% on Windows.

a22

My specs:

Update: GPU Usage on Ubuntu: Screenshot from 2020-07-05 10-34-30

AlexeyAB commented 4 years ago

What command do you use?

Do you use command? ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark

vinhtq115 commented 4 years ago

What command do you use?

Do you use command? ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark

I ran with a video. Command is darknet detector demo data\gunonly.data cfg\yolov3_gun_only.cfg yolov3_custom_train_final_gun_only.weights cctv2.mp4 -json_port 8070 -mjpeg_port 8090 -ext_output -dont_show. Same video on both Windows 10 and Ubuntu. Also, I compiled from build/darknet folder on Windows because the one in root folder won't detect CUDA compiler. On Ubuntu, I compiled in root folder.

AlexeyAB commented 4 years ago

Run this command and show screenshot with FPS and GPU-usage: darknet detector demo data\gunonly.data cfg\yolov3_gun_only.cfg yolov3_custom_train_final_gun_only.weights cctv2.mp4 -benchmark

vinhtq115 commented 4 years ago

On Windows 10: windows On Ubuntu: ubuntu

AlexeyAB commented 4 years ago

Thanks! Did you use -benchmark flag?

vinhtq115 commented 4 years ago

Yes I used it for both runs.