Some questions regarding "yolo_console_dll"

MyVanitar commented 7 years ago

Hello,

First, I should mentioned it is a very good piece of work indeed. I have these questions:

1) The code uses the GPU Cores 90% of their power. The FPS start from 30FPS, but it reduces to 20FPS if I don't move the camera. it means if the camera is fixed, then the FPS rate drops, why?

2) When I use an offline video, the FPS increases to around 40FPS, two times more than a live stream video, why?

3) When I use online/offline video, how can I get access to the coordinates, object_ID and each object confidence level? These information prints on console when I use an image, when it gets closed.

4) How can I set a threshold for detection?

AlexeyAB commented 7 years ago

FPS of Yolo can't be more than FPS of network camera, so problem in your camera. What FPS-detection and FPS-capture did you get for your camera? Just comment this line: https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L134 (FPS detection will be more than FPS camera, but it does not make sense, because the same frames will be re-used)
We can read video-file faster than real-time
Results in the result_vec Un-comment this line - it will shows result, but FPS will be decreased: https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L169
0.24 is threshold: https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L148

MyVanitar commented 7 years ago

What FPS-detection and FPS-capture did you get for your camera?

The camera I use is from the Android application and its allocated IP address. FPS detection is around 43 to 44 and FPS capture varies from 15 to 30, dependent on video scenes and if I move the camera or not. usually it satys around 20.

AlexeyAB commented 7 years ago

Did you get this result after or before this line was commented? https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L134

AlexeyAB commented 7 years ago

@VanitarNordic Also you can try to use new state-of-art model (for classification and segmentation) densenet201_yolo.cfg and densenet201.300 instead of yolo-voc.2.0.cfg and darknet19_448.conv.23 to train your detector and measure IoU: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708

MyVanitar commented 7 years ago

Did you get this result after or before this line was commented?

Not yet, I'll do and inform you. The phone says it supports up to 30FPs and it is its default refresh rate speed.

Also you can try to use new state-of-art model (for classification and segmentation)

Oh really, so YOLO now also supports semantic segmentation? it seems it can be used for object detection also as you described in that issue #179 is it the one which has mentioned here?: https://pjreddie.com/darknet/imagenet/#extraction

AlexeyAB commented 7 years ago

No, Yolo doesn't support semantic segmentation yet, but the same network you can use for object detection with better accuracy than yolo-voc.2.0.cfg.

MyVanitar commented 7 years ago

Okay, Thank you.

I tested and the results are better. I can say it is more more stable around 25FPS on 640*480 resolution. anyway the Darknet itself also is not stable and FPS rate drops and improves dramatically just by changing camera position. Maybe some scenes require more processing.

Are these values changes if I use a more powerful GPU? multiplied by GFLOPS?

AlexeyAB commented 7 years ago

FPS detection is around 43 to 44 and FPS capture varies from 15 to 30

FPS-detection is FPS of Darknet Yolo, so 43-44 is stable value (it will be increased using powerful GPU)
FPS-capture is FPS of Camera, so 15-30 - you camera isn't stable (it will increased using good Camera)

In my case FPS values are fairly stable, when used yolo_console_dll.exe:

FPS-detection = 39-42 on GeForce GTX970
FPS-capture = 50-51 on HikVision RTSP-network-camera FullHD (1920x1080)

it seems it can be used for object detection also as you described in that issue #179 is it the one which has mentioned here?: https://pjreddie.com/darknet/imagenet/#extraction

Yes, it is about DenseNet201 mentioned here as classifier: https://pjreddie.com/darknet/imagenet/#densenet201 And I propose to use it as a detector (replacing only the last layers of avgpool, softmax, cost to region from Yolo), and using first 300 pre-trained layers from densenet201.weights

MyVanitar commented 7 years ago

Humm, Yes it might be because of Android Phone and WiFi connection network. My GPU is GTX1060 6G. The phone FPS is fixed at 30FPS. maybe it is a bottleneck for the GPU.

AlexeyAB commented 7 years ago

What is the mobile-application do you use to send mpeg-stream?

MyVanitar commented 7 years ago

What is the mobile-application do you use to send mpeg-stream?

IP Webcam

Besides, I started to train desnet201. it started okay but after 200-300 hundred iterations, all subdivisions became nand

AlexeyAB commented 7 years ago

Can you detect anything using weights after 300 iterations?
Try to decrease to subdivison=8 or 4 Also try to set saturation = 1.5 exposure = 1.5 And train.
If after the changes above you still see nand then try to set
```
learning_rate=0.001
steps=40000,60000
scales=.1,.1
```

MyVanitar commented 7 years ago

No, IOU and Recall are zero.
subdivision lower than 16 will cause out of memory. I'll try your suggested parameters.
Okay. I'll let you know.

But have you used the darknet.exe partial to extract your desired weights?

AlexeyAB commented 7 years ago

Yes, I used darknet.exe partial cfg/densenet201.cfg densenet201.weights densenet201.300 300 to get densenet201.300 from densenet201.weights. So in my comment https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708 I gave a link already to densenet201.300

And as you can see in my screenshot I can get result using trained densenet-yolo weights after 300 iterations: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708

Have you done exactly all steps described in my comment https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708 ?

MyVanitar commented 7 years ago

Yes, I'm sure everything is correct. it is like the past, CFG and initial weights are changed.

But I have to correct my word, validation handles values of IOU and Recall, although they are low, but after 450 iterations, it goes to -nand

MyVanitar commented 7 years ago

#max_crop=448 should always keep commented?

AlexeyAB commented 7 years ago

max_crop=448 should always keep commented?

Yes.

Can you detect anything using different weights after 100, 200 or 300 iterations? darknet.exe detector test data/obj.data densenet201_obj.cfg backup/densenet201_obj_100.weights -thresh 0.1

MyVanitar commented 7 years ago

Yes, although results are bad, but I could see some boxes when I test on training images.

MyVanitar commented 7 years ago

Also I considered something, the generated weights of the yolo-voc-2.0 are about 262Mb, but weights of the densenet201 are about 61Mb, almost the same size as its initial weights.

AlexeyAB commented 7 years ago

This is strange, because I can detect objects using densenet trained after 300 iterations.

Yes, added only one last convolutional layer: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/densenet201_yolo.cfg#L1940

densenet201.300 - 62 529 808 bytes
densenet201_obj_100.weights - 63 015 916 bytes

MyVanitar commented 7 years ago

Your point for extraction is correct, I mean 300.

But you know I believe the weights should be heavier, even heavier than Darknet-448. because the model is deeper.

I can detect objects but it is in accompany with IOU and Recall values which I get on 200 iterations (IOU=27%, Recall=14%). Therefore it will have high errors.

AlexeyAB commented 7 years ago

Densenet uses weights in a more optimal way. So it should be more accurate with smaller weights-file: https://pjreddie.com/darknet/nightmare/

See comparing of Mbyte-size and accuracy (Top-1 or Top-5) with other networks: densenet201

Also deeper model usually trains exponentially longer (this problem was solved in ResNet using shortcut-layers). So we can't compare precision (IoU/Recall) for weights after the same number of iterations in deep and short networks. Deeper network will be trained longer, but will eventually have greater accuracy (and overfitting will come much later).
But you are right, this is too small, so I added 4 convolutional layers as done in darknet19_448.cfg to get yolo-voc.2.0.cfg. So try to use this new densenet201_yolo2.cfg:

https://drive.google.com/open?id=0BwRgzHpNbsWBeTlpajNWc21jZ0k

MyVanitar commented 7 years ago

Thank you. I'll try and let you know.

MyVanitar commented 7 years ago

Okay, now I disclose the results.

This time with the new CFG file, the training was better and I have not faced any -nan(ind), except randomly inside subdivisions which I think it happens for you also.

The trained model is Okay, BUT still the Darknet-448 outperforms it. I mean it is more accurate. I trained the DenseNet for 2000 and then continued for 3000 to see if it improves more or not, but it could not outperform the Darknet-448.

I'll make some changes inside CFG, maybe it helped to improve the accuracy.

in the meantime, let's we come back to our topic. As I mentioned the camera deliver video over the network in 30-FPS (in the phone settings), but the FPS Capture is not stable. Even I reduced the resolution to 640*480, otherwise the FPS capture will be reduced. if this is the camera problem, then I should see this phenomenon in the phone itself, isn't it?

AlexeyAB commented 7 years ago

The trained model is Okay, BUT still the Darknet-448 outperforms it. I mean it is more accurate. I trained the DenseNet for 2000 and then continued for 3000 to see if it improves more or not, but it could not outperform the Darknet-448.

About Darknet-448, do you mean yolo-voc.2.0.cfg?

in the meantime, let's we come back to our topic. As I mentioned the camera deliver video over the network in 30-FPS (in the phone settings), but the FPS Capture is not stable. Even I reduced the resolution to 640*480, otherwise the FPS capture will be reduced. if this is the camera problem, then I should see this phenomenon in the phone itself, isn't it?

No. You may see 10 FPS on phone, but bottlenecks in: Ethernet, Wi-fi, lack of CPU-performance, bugs in OpenCV 2.4.x can reduce FPS.

What OpenCV version do you use? (OpenCV 2.4.x has bugs in capturing network-stream with high FPS, that is why I migrated to OpenCV 3.x, that works perfectly)
What FPS can you get from your phone when using VLC-player, or using simple OpenCV application that uses only capture network-stream from phone in a loop?

MyVanitar commented 7 years ago

About Darknet-448, do you mean yolo-voc.2.0.cfg?

Yes, DenseNet-201 did not outperform the yolo-voc.2.0.cfg. it must be significantly better to compensate its 2x slower speed.

What OpenCV version do you use?

OpenCV-2.4.9.

What FPS can you get from your phone when using VLC-player, or using simple OpenCV application that uses only capture network-stream from phone in a loop?

I could not see FPS rate inside VLC, because it does not show it when I play Network Stream. usually it must be one of the parameters inside the Codec information tab which was not when I play Network Stream, but the video has lags and delay. BUT it plays very smooth and fast inside a browser such as Chrome, but I don't know the FPS, but it seems to be high.

AlexeyAB commented 7 years ago

Is densenet201_yolo2.cfg more accurate than densenet201_yolo.cfg? And what size has file densenet201_yolo2_300.weights?

So, try to use OpenCV 3.x.

MyVanitar commented 7 years ago

Is densenet201_yolo2.cfg more accurate than densenet201_yolo.cfg? And what size has file densenet201_yolo2_300.weights?

Yes, densenet201_yolo2.cfg was much much better. As I mentioned, I did not face any -nan(ind) and it trained well till the end, BUT it could not outperform the yolo-voc-2.0.cfg

MyVanitar commented 7 years ago

And what size has file densenet201_yolo2_300.weights?

The size is 190.4Mb. I trained it two times till 3000 iterations

MyVanitar commented 7 years ago

So, try to use OpenCV 3.x.

I did use OpenCV-3.3 and now the FPS is mostly at around 26-30FPS. Thank you. I think the Console speed is even better than the Darknet demo, very good. Although the code is tricky to understand (for me), but at least it is C++ which is much easier to deal rather than C. I should appreciate you again for this code. Many good OpenCV functions are available in C++ only.

1) What if(consumed) does in the code?

2) How can I access each parameter (coordinates, object_id, confidence) separately as variables rather than using the show_console_result(result_vec, obj_names); to be printed in console? I want to have them to make some extra operations.

AlexeyAB commented 7 years ago

int size = result_vec.size(); number of detected objects on this frame bbox_t obj = result_vec[3]; get coords and probability of 3-rd detected object int left_x_coord = obj.x; - left x coord int top_y_coord = obj.y; - top y coord int width = obj.w; - width of box int obj_id = obj.obj_id; - object id (class id) float porb = obj.prob; - probability of this object

You can iterate objects in a such way:

for(size_t i = 0; i < result_vec.size(); ++i) {
 bbox_t box =  result_vec[i];
 std::cout << box.obj_id << "\n"; // output to consolu each object id - each in new line
 // do something else with box ...
}

Also you can se how it is done in: https://github.com/AlexeyAB/darknet/blob/2baa7bde542ed490f8ab35c82dd3174fddea63f3/src/yolo_console_dll.cpp#L63

AlexeyAB commented 7 years ago