Open hfq0219 opened 4 years ago
Can you try to benchmark without a webcam? I've seen OpenCV tank performance because of badly-optimized webcam input handling in another piece of code a few years ago, it might be worthwhile to verify if that could be a factor in your case.
Run command
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show -ext_output -mjpeg_port 8090
and open URL: http://localhost:8090 in Chrome or Firefox web-browser
Then show screenshot of AVG FPS from console.
And show screenshot with such information
./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
Can you try to benchmark without a webcam? I've seen OpenCV tank performance because of badly-optimized webcam input handling in another piece of code a few years ago, it might be worthwhile to verify if that could be a factor in your case.
I do not run it with webcam, but only local video file.
Run command
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show -ext_output -mjpeg_port 8090
and open URL: http://localhost:8090 in Chrome or Firefox web-browser
Then show screenshot of AVG FPS from console.
And show screenshot with such information
./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.2.0 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
here is the fps: in ubuntu is 46 fps and in windows 38.6 fps here is the environment information:
Try to download the latest version of Darknet and show these 4 screenshots.
Try to download the latest version of Darknet and show these 4 screenshots.
It is the latest version, i only change yolov4.cfg to fit my custom data and then train it to get the weights file.
It is the latest version
No, the latest version shows GPU Name. Download the latest master branch.
It is the latest version
No, the latest version shows GPU Name. Download the latest master branch.
this is the new result: thank you!
i run darknet in ubuntu and windows, with the same hardware, gpu is rtx2080ti. in ubuntu, video detection speed can reach over 35 fps, but in windows, it's only about 25 fps, how?
But I see 40 - 47 FPS on you screenshots.
i run darknet in ubuntu and windows, with the same hardware, gpu is rtx2080ti. in ubuntu, video detection speed can reach over 35 fps, but in windows, it's only about 25 fps, how?
But I see 40 - 47 FPS on you screenshots.
Because i use the new cpu, the previous is intel i5 6500, and now is intel i5 10500, it's the only difference. But you also can see the fps in windows is lower than linux, maybe 7 to 10 fps?
Try to run two commands on two OS and show 4 screenshots - so I will understand what is wrong in you case:
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
Try to run two commands on two OS and show 4 screenshots - so I will understand what is wrong in you case:
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -dont_show
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
Oh sorry, I forget that in my windows the hard-disk is mechanical hard disk, and in linux is solid state disk. Maybe this is the cause? And the screenshots:
This is very strange, first you say that FPS 25-35, then you show a screenshot from FPS 40-47, and now FPS 32-55.
What is the GPU-usage on both OS, if you run ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
This is very strange, first you say that FPS 25-35, then you show a screenshot from FPS 40-47, and now FPS 32-55.
What is the GPU-usage on both OS, if you run ./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
Of course it's very strange, but the GPU-usage is low.The time in screenshot is different because of the time zone settings.
So the problem is on Windows.
What CPU usage do you see?
What other programs are running in parallel? Perhaps some other program is spending CPU resources.
What compilation way do you use?
Try to change these lines: https://github.com/AlexeyAB/darknet/blob/09991d0488ec49002366013e8e5185941f88b493/src/demo.c#L329-L330
to this
this_thread_sleep_for(thread_wait_ms);
and recopmile.
So the problem is on Windows.
What CPU usage do you see?
What other programs are running in parallel? Perhaps some other program is spending CPU resources.
What compilation way do you use?
- MSVS
- Cmake+MSVS
- vcpkg
Try to change these lines:
to this
this_thread_sleep_for(thread_wait_ms);
and recopmile.
No other programs are running because i reboot the computer, and i build darknet with darknet.sln in darknet\build\darknet\darknet.sln. After change the lines and recompile, the result:
Try to change these lines:
to this
this_thread_sleep_for(thread_wait_ms);
and recopmile.
This is somewhat offtopic, but I noticed you use a spinlock + yield to signal start and completion events between the main thread and fetch_thread()
/ detect_thread()
. User-space spinlocks are generally not a good way to do that, and it's possible that there's a lot of unnecessary contention and CPU usage happening. Also, the schedulers in Windows and Linux behave differently when it comes to yielding, so the same spinlock implementation might not be comparable between the two ports even when built from the same source commit.
Would you consider merging a semaphore-based inter-thread signaling PR if I write one and it has a positive impact on CPU usage?
@imaami
The scheduler controls only the kernel-space, but not the user-space, so the difference (Win/Linux) in the schedulers much more affects mutexes/semaphores than the user-space-spinlock.
Can you reproduce this issue on your PC? Because I can't reproduce this issue.
Would you consider merging a semaphore-based inter-thread signaling PR if I write one and it has a positive impact on CPU usage?
Yes, if you give enough comparative tests that I can reproduce on my hardware or in the cloud.
@hfq0219 What Windows version do you use?
@hfq0219 What Windows version do you use?
win10 1909
Bump. I have the same issue. Tried on both HDD and SSD but FPS doesn't change. I'm using YOLOv3. On Ubuntu, I can achieve about 27-28 FPS but on Windows 10 2004, I can only achieve about 21 FPS. Also, I noticed that GPU Usage is only around 65-75% on Windows.
My specs:
Update: GPU Usage on Ubuntu:
What command do you use?
Do you use command?
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
What command do you use?
Do you use command?
./darknet detector demo cfg/coco.data cfg/yolov4 yolov4.weights -benchmark
I ran with a video. Command is darknet detector demo data\gunonly.data cfg\yolov3_gun_only.cfg yolov3_custom_train_final_gun_only.weights cctv2.mp4 -json_port 8070 -mjpeg_port 8090 -ext_output -dont_show
.
Same video on both Windows 10 and Ubuntu. Also, I compiled from build/darknet folder on Windows because the one in root folder won't detect CUDA compiler. On Ubuntu, I compiled in root folder.
Run this command and show screenshot with FPS and GPU-usage: darknet detector demo data\gunonly.data cfg\yolov3_gun_only.cfg yolov3_custom_train_final_gun_only.weights cctv2.mp4 -benchmark
On Windows 10: On Ubuntu:
Thanks! Did you use -benchmark
flag?
Yes I used it for both runs.
i run darknet in ubuntu and windows, with the same hardware, gpu is rtx2080ti. in ubuntu, video detection speed can reach over 35 fps, but in windows, it's only about 25 fps, how?