Closed MyVanitar closed 7 years ago
FPS of Yolo can't be more than FPS of network camera, so problem in your camera. What FPS-detection and FPS-capture did you get for your camera? Just comment this line: https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L134 (FPS detection will be more than FPS camera, but it does not make sense, because the same frames will be re-used)
We can read video-file faster than real-time
Results in the result_vec
Un-comment this line - it will shows result, but FPS will be decreased: https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L169
0.24
is threshold: https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L148
What FPS-detection and FPS-capture did you get for your camera?
The camera I use is from the Android application and its allocated IP address. FPS detection is around 43 to 44 and FPS capture varies from 15 to 30, dependent on video scenes and if I move the camera or not. usually it satys around 20.
Did you get this result after or before this line was commented? https://github.com/AlexeyAB/darknet/blob/88e2fce754c551a5adbb470514a12a7a9ae95a07/src/yolo_console_dll.cpp#L134
@VanitarNordic Also you can try to use new state-of-art model (for classification and segmentation) densenet201_yolo.cfg and densenet201.300
instead of yolo-voc.2.0.cfg and darknet19_448.conv.23
to train your detector and measure IoU: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708
Did you get this result after or before this line was commented?
Not yet, I'll do and inform you. The phone says it supports up to 30FPs and it is its default refresh rate speed.
Also you can try to use new state-of-art model (for classification and segmentation)
Oh really, so YOLO now also supports semantic segmentation? it seems it can be used for object detection also as you described in that issue #179 is it the one which has mentioned here?: https://pjreddie.com/darknet/imagenet/#extraction
No, Yolo doesn't support semantic segmentation yet, but the same network you can use for object detection with better accuracy than yolo-voc.2.0.cfg.
Okay, Thank you.
I tested and the results are better. I can say it is more more stable around 25FPS on 640*480 resolution. anyway the Darknet itself also is not stable and FPS rate drops and improves dramatically just by changing camera position. Maybe some scenes require more processing.
Are these values changes if I use a more powerful GPU? multiplied by GFLOPS?
FPS detection is around 43 to 44 and FPS capture varies from 15 to 30
In my case FPS values are fairly stable, when used yolo_console_dll.exe
:
it seems it can be used for object detection also as you described in that issue #179 is it the one which has mentioned here?: https://pjreddie.com/darknet/imagenet/#extraction
Yes, it is about DenseNet201 mentioned here as classifier: https://pjreddie.com/darknet/imagenet/#densenet201
And I propose to use it as a detector (replacing only the last layers of avgpool, softmax, cost
to region
from Yolo), and using first 300 pre-trained layers from densenet201.weights
Humm, Yes it might be because of Android Phone and WiFi connection network. My GPU is GTX1060 6G. The phone FPS is fixed at 30FPS. maybe it is a bottleneck for the GPU.
What is the mobile-application do you use to send mpeg-stream?
What is the mobile-application do you use to send mpeg-stream?
IP Webcam
Besides, I started to train desnet201. it started okay but after 200-300 hundred iterations, all subdivisions became nand
Can you detect anything using weights after 300 iterations?
Try to decrease to subdivison=8
or 4
Also try to set saturation = 1.5
exposure = 1.5
And train.
If after the changes above you still see nand
then try to set
learning_rate=0.001
steps=40000,60000
scales=.1,.1
No, IOU and Recall are zero.
subdivision lower than 16 will cause out of memory. I'll try your suggested parameters.
Okay. I'll let you know.
But have you used the darknet.exe partial
to extract your desired weights?
Yes, I used darknet.exe partial cfg/densenet201.cfg densenet201.weights densenet201.300 300
to get densenet201.300 from densenet201.weights. So in my comment https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708 I gave a link already to densenet201.300
And as you can see in my screenshot I can get result using trained densenet-yolo weights after 300 iterations: https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708
Have you done exactly all steps described in my comment https://github.com/AlexeyAB/darknet/issues/179#issuecomment-329829708 ?
Yes, I'm sure everything is correct. it is like the past, CFG and initial weights are changed.
But I have to correct my word, validation handles values of IOU and Recall, although they are low, but after 450 iterations, it goes to -nand
#max_crop=448
should always keep commented?
max_crop=448 should always keep commented?
Yes.
Can you detect anything using different weights after 100, 200 or 300 iterations? darknet.exe detector test data/obj.data densenet201_obj.cfg backup/densenet201_obj_100.weights -thresh 0.1
Yes, although results are bad, but I could see some boxes when I test on training images.
Also I considered something, the generated weights of the yolo-voc-2.0 are about 262Mb, but weights of the densenet201 are about 61Mb, almost the same size as its initial weights.
This is strange, because I can detect objects using densenet trained after 300 iterations.
Yes, added only one last convolutional layer: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/densenet201_yolo.cfg#L1940
densenet201.300
- 62 529 808 bytesdensenet201_obj_100.weights
- 63 015 916 bytesYour point for extraction is correct, I mean 300.
But you know I believe the weights should be heavier, even heavier than Darknet-448. because the model is deeper.
I can detect objects but it is in accompany with IOU and Recall values which I get on 200 iterations (IOU=27%, Recall=14%). Therefore it will have high errors.
See comparing of Mbyte-size and accuracy (Top-1 or Top-5) with other networks:
Also deeper model usually trains exponentially longer (this problem was solved in ResNet using shortcut-layers). So we can't compare precision (IoU/Recall) for weights after the same number of iterations in deep and short networks. Deeper network will be trained longer, but will eventually have greater accuracy (and overfitting will come much later).
But you are right, this is too small, so I added 4 convolutional layers as done in darknet19_448.cfg to get yolo-voc.2.0.cfg. So try to use this new densenet201_yolo2.cfg
:
https://drive.google.com/open?id=0BwRgzHpNbsWBeTlpajNWc21jZ0k
Thank you. I'll try and let you know.
Okay, now I disclose the results.
This time with the new CFG file, the training was better and I have not faced any -nan(ind)
, except randomly inside subdivisions which I think it happens for you also.
The trained model is Okay, BUT still the Darknet-448
outperforms it. I mean it is more accurate. I trained the DenseNet
for 2000 and then continued for 3000 to see if it improves more or not, but it could not outperform the Darknet-448
.
I'll make some changes inside CFG, maybe it helped to improve the accuracy.
in the meantime, let's we come back to our topic. As I mentioned the camera deliver video over the network in 30-FPS (in the phone settings), but the FPS Capture is not stable. Even I reduced the resolution to 640*480, otherwise the FPS capture will be reduced. if this is the camera problem, then I should see this phenomenon in the phone itself, isn't it?
The trained model is Okay, BUT still the Darknet-448 outperforms it. I mean it is more accurate. I trained the DenseNet for 2000 and then continued for 3000 to see if it improves more or not, but it could not outperform the Darknet-448.
About Darknet-448, do you mean yolo-voc.2.0.cfg
?
in the meantime, let's we come back to our topic. As I mentioned the camera deliver video over the network in 30-FPS (in the phone settings), but the FPS Capture is not stable. Even I reduced the resolution to 640*480, otherwise the FPS capture will be reduced. if this is the camera problem, then I should see this phenomenon in the phone itself, isn't it?
No. You may see 10 FPS on phone, but bottlenecks in: Ethernet, Wi-fi, lack of CPU-performance, bugs in OpenCV 2.4.x can reduce FPS.
What OpenCV version do you use? (OpenCV 2.4.x has bugs in capturing network-stream with high FPS, that is why I migrated to OpenCV 3.x, that works perfectly)
What FPS can you get from your phone when using VLC-player, or using simple OpenCV application that uses only capture network-stream from phone in a loop?
About Darknet-448, do you mean yolo-voc.2.0.cfg?
Yes, DenseNet-201 did not outperform the yolo-voc.2.0.cfg
. it must be significantly better to compensate its 2x slower speed.
What OpenCV version do you use?
OpenCV-2.4.9.
What FPS can you get from your phone when using VLC-player, or using simple OpenCV application that uses only capture network-stream from phone in a loop?
I could not see FPS rate inside VLC, because it does not show it when I play Network Stream. usually it must be one of the parameters inside the Codec information tab which was not when I play Network Stream, but the video has lags and delay. BUT it plays very smooth and fast inside a browser such as Chrome, but I don't know the FPS, but it seems to be high.
Is densenet201_yolo2.cfg
more accurate than densenet201_yolo.cfg
? And what size has file densenet201_yolo2_300.weights
?
So, try to use OpenCV 3.x.
Is densenet201_yolo2.cfg more accurate than densenet201_yolo.cfg? And what size has file densenet201_yolo2_300.weights?
Yes, densenet201_yolo2.cfg
was much much better. As I mentioned, I did not face any -nan(ind)
and it trained well till the end, BUT it could not outperform the yolo-voc-2.0.cfg
And what size has file densenet201_yolo2_300.weights?
The size is 190.4Mb. I trained it two times till 3000 iterations
So, try to use OpenCV 3.x.
I did use OpenCV-3.3 and now the FPS is mostly at around 26-30FPS. Thank you.
I think the Console speed is even better than the Darknet demo
, very good.
Although the code is tricky to understand (for me), but at least it is C++ which is much easier to deal rather than C. I should appreciate you again for this code. Many good OpenCV functions are available in C++ only.
1) What if(consumed)
does in the code?
2) How can I access each parameter (coordinates, object_id, confidence) separately as variables rather than using the show_console_result(result_vec, obj_names);
to be printed in console?
I want to have them to make some extra operations.
int size = result_vec.size();
number of detected objects on this frame
bbox_t obj = result_vec[3];
get coords and probability of 3-rd detected object
int left_x_coord = obj.x;
- left x coord
int top_y_coord = obj.y;
- top y coord
int width = obj.w;
- width of box
int obj_id = obj.obj_id;
- object id (class id)
float porb = obj.prob;
- probability of this object
You can iterate objects in a such way:
for(size_t i = 0; i < result_vec.size(); ++i) {
bbox_t box = result_vec[i];
std::cout << box.obj_id << "\n"; // output to consolu each object id - each in new line
// do something else with box ...
}
Also you can se how it is done in: https://github.com/AlexeyAB/darknet/blob/2baa7bde542ed490f8ab35c82dd3174fddea63f3/src/yolo_console_dll.cpp#L63
What
if(consumed)
does in the code?
This is condition, when main-loop should get result_vec and send the newest captured frame to detector-thread only if previous frame was taken by detector-thread: https://github.com/AlexeyAB/darknet/blob/2baa7bde542ed490f8ab35c82dd3174fddea63f3/src/yolo_console_dll.cpp#L144
Thank you.
this article has introduced its own modifications to overcome the YOLO weaknesses and they named it YOLT:
How can I access to the object tracking counter variable?
it seems the object_id
is related the object position inside the obj.names
file, not the object tracking number.
Use box.track_id
Thank you again.
Hello,
First, I should mentioned it is a very good piece of work indeed. I have these questions:
1) The code uses the GPU Cores 90% of their power. The FPS start from 30FPS, but it reduces to 20FPS if I don't move the camera. it means if the camera is fixed, then the FPS rate drops, why?
2) When I use an offline video, the FPS increases to around 40FPS, two times more than a live stream video, why?
3) When I use online/offline video, how can I get access to the coordinates, object_ID and each object confidence level? These information prints on console when I use an image, when it gets closed.
4) How can I set a threshold for detection?