Closed MyVanitar closed 7 years ago
@VanitarNordic Hi,
You can add printf("%d, %d, %d, %d \n", left, right, top, bot);
here: https://github.com/AlexeyAB/darknet/blob/master/src/image.c#L219
or also add:
int x_center = b.x*im.w;
int y_center = b.y*im.h
int width = b.w*im.w;
int height= b.h*im.h;
Training guide is in progress, yet: https://groups.google.com/d/msg/darknet/0ksFU91emmc/QMEO0HnHAgAJ
Thank you very much.
Do you know ho we can add a live video camera support instead of image as input? You mentioned about a camera which is installed on a network (accessible by IP), but I mean host connected cameras such as internal webcam, USB3 , .... similar.
@VanitarNordic
Yes, for WebCamera number 0 you can use : darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights -c 0
@VanitarNordic
How can I train the Yolo2 for my own desired objects?
Now you can train Yolo v2 by using following instructions: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data
Original for Linux: http://pjreddie.com/darknet/yolo/#train-voc
Thank you gentleman,
I read that briefly, but as I realized it is about re-generating the training data file based on VOC. what about if we have our selected discrete 1000 image files (which contain variation of a desired object within other objects) and decided to train the Yolo-2 with these?
I mean training with our own image files from scratch.
@VanitarNordic
To training for your 2 objects:
Copy yolo-voc.cfg
to yolo-obj.cfg
and change line classes=20
to classes=2
Create file obj.names
with 2 objects names each in new line
Create file train.txt
with filenames of your images each in new line
Create file obj.data
containing:
classes= 2
train = train.txt
valid = test.txt
names = obj.names
backup = backup/
.txt
-extension, and put to it for each object on this image in new line: <object-class> <x> <y> <width> <height>
- float values relative to width and height of image.For example (atention: x, y - centers of rectangle) for img1.jpg
you create img1.txt
containing:
1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667
Download pre-trained weights for the convolutional layers (76 MB): http://pjreddie.com/media/files/darknet19_448.conv.23 and put to the directory build\darknet\x64
Run training: darknet.exe detector train obj.data yolo-obj.cfg darknet19_448.conv.23
Thank you again Alexey.
I have some more questions:
1) in step-1 you mentioned: "Copy yolo-voc.cfg to yolo-obj.cfg and ..." . Do you man replacing the "yolo-voc.cfg" file with "yolo-obj.cfg"?
2) in step-4, Do you mean just about creating a file, which contains those information?
3) in step-5, do you know any toll which generates such annotation file? OpenCV has such a tool, but it produces annotation files differently (x, y are top left coordinate and they are integer values)
@VanitarNordic
I mean you should create new file "yolo-obj.cfg" with the same content as "yolo-voc.cfg", but with only one change classes=2
Yes.
No, I don't know such soft. About what tool in OpenCV do you talk, can you give link?
Also you can ask about it here: https://groups.google.com/forum/#!forum/darknet
@VanitarNordic
Also you should change filters=(classes + 5)*5
in your yolo-voc.cfg
I added How to train (to detect your custom objects)
: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Thank you Alexey
Very good explanation.
I have a few question either.
1) if I wanted to detect one object type, such as just cars and nothing else, then the number of classes would be equal to 1?
2) The first name in the first line of the "obj.names" file, relates to class 1? and similarly line 2 correspond the class 2?
Finally still I don't understand why <x> <y> <width> <height>
values for each image are float numbers. if I understand why is like that, I could maybe be able to create a software to make these files and values, if we couldn't find what tool the authors have used to make these.
@VanitarNordic
obj.data
and in yolo-obj.cfg
(and filters=30
in yolo-obj.cfg
)
<object-class>
always be 0
There are used float values for <x> <y> <width> <height>
because are relative to the absolute Width x Height of image, and can be equal from 0.0 to 1.0. The advantage of the relative values that are valid for any resizing images.
Input images can be any size (any width and height) both for training and prediction, and here any image resized to the neural-network size (416x416 or 448x448), but relative values <x> <y> <width> <height>
still valid without changes: https://github.com/AlexeyAB/darknet/blob/master/src/demo.c#L49
Thanks,
Please correct me if the below calculation is not correct:
(x, y: center of the rectangle)
relative x = absolute x / width
relative y = absolute y / height
relative height = absolute height / height
relative width = absolute width / width
@VanitarNordic
Yes.
I created a new repository with GUI-software for generating annotation file for Yolo v2, which I wrote myself before: https://github.com/AlexeyAB/Yolo_mark
Thank you,
may I ask you what speed (fps) have you achieved in testing the Yolo-2 on CPU? mine is very slow (few seconds for an image), other DNN based algorithms are slow in training but okay in test and run-time. am I doing something wrong?
no idea?
@VanitarNordic
0.3 FPS
32 FPS
Darknet Yolo v2 is not optimized for CPU and use only 1 - 2 Cores.
You have a sophisticated graphic card but 32FPS. it should be at last 60FPS for not blinking and real-time. Why the YOLO1 and 2 authors always claim it is fast algorithm?
I got 32 FPS for full Yolo v2 480x480 on GTX 970 without cuDNN. It is not fast GPU, top GPU Nvidia Titan X GP102 is 3 x faster.
3.5 TFlops-SP
(without cuDNN)6.1 TFlops-SP
x 1.74Resluts:
32 FPS
on GTX 970 (without cuDNN)59 FPS
on Titan X GM200 x 1.84Did you try any else object detectors: Faster-RCNN ResNet-152, SSD 300/500 old & new*?
480*480 is the input resolution (image or video)? from the curve I can assume that Yolo-2 is somewhere between speed and accuracy, isn't t?
I have tried the Dlib and it seems it is faster and more accurate
480x480 is input resolution of neural network. All YoloV2 points lies on optimal Pareto frontier, i.e. it is state-of-art. If you want more than 30 FPS on TitanX, those there is nothing better at the moment for accuracy/speed.
All objects-detectors of dlib are much less accurate. Which one object-detector do you use from dlib?
Actually you have got 59FPS on Titan X as I see, which is good.
I am not deeply familiar with the algorithm itself, so if the input to the neural network is different with the main input, then what is the resolution of the main input images (or video from the camera) and what about if we decided to use HD resolution as camera or input? (Such as HDMI camera)
I used face pose detection on CPU and it was good. but because I do not have a professional GPU, I have not tested his last post here: http://blog.dlib.net/ What he claims about speed and accuracy is very good if he is right. it seems the accuracy is better than RCNN.
If you use 480x480 Yolo v2 and capture FullHD video 1920x1080, then each frame will be resized to 480x480, then will be processed by the neural network, with the best accuracy/speed among all realtime (>30 FPS) object-detectors.
If you want to detect very small objects (15x15 pixels) then you can divide the input image (1920x1080) into overlapping (10%) small images (480x480) and process each of them. You have to write this code yourself.
What about Dlib's last blog post?
Also I have heard about Caffe. What is your opinion about them?
@VanitarNordic
It is necessary to distinguish: frameworks, apporoaches of region proposals, neural nets.
Frameworks:
Approaches of region proposals - using Caffe:
Neural Networks:
For example, commonly used together:
Thanks,
I mean DetectNet (object detection) which is trained based on NVCaffe. GoogleNet does the classification.
@VanitarNordic DetectNet worse than Yolo v2.
Results of DetectNet is absent in any tests for Detection:
DetectNet uses: framework(Caffe) + approach(DetectNet based on old Yolo v1) + network(DetectNet based on GoogLeNet)
1) What about Dlib 19.2?
2) I am so curious if I could be able to train the Yolo-2 with DIGITS. probably it must have a caffemodel
and a prototxt
file.
3) What is your opinion about GTX 1080 GPU, can you predict how fast Yolo-2 would be (FPS) on this graphic card (for detection)?
@VanitarNordic
But for other objects than faces it may have a bad result, dlib is absent in any public tests for Detection:
Also, current the best approach Caffe + RFCN + ResNet-101 (https://github.com/daijifeng001/r-fcn) has much better result, with x2 less errors, than FasterRCNN-VGG16.
I.e. dlib is not the best, but good.
No, you can't train Yolo-model in Caffe or Caffe-DIGITS. There is soft to convert Yolo v1 cfg-file and weights-file to prototxt and caffemodel, but it works only for old Yolo v1: https://github.com/xingwangsfu/caffe-yolo
You can simply compare this results from the picture for nVidia Titan X GM200 with 6144 GFlops With any nVidia GPU from this list: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series
Thank you. again very professional and comprehensive explanation. Really I have nothing to tell anymore. fantastic :-)
Also you gave me a parameter to compare GPUs for DNN if I decided to purchase one wisely. Gflop
So by the way Yolo-2 should be the best both in terms of precision and speed, yes?
@VanitarNordic In different tests may be different winners. But there are three of the best methods for real-time:
For not real-time the best Caffe-RFCN+ResNet101: https://github.com/daijifeng001/r-fcn
The Caffe-PVANet refers to which model in the picture (Voc 2007 test I mean)?
SSD512 is accurate but is slow even on Titan X.
It is not on VOC2007, but is on VOC2012 (comparison for DNNs trained on very large data-set): http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=4&submid=9804
well, according to the github description, it has achieved mAP=84.9 on VOC2007, but it has not mentioned the speed (FPS)
all on Titan X (GM200) PVANet+ mAP=84,2 FPS=22 PVANet+ (compressed) mAP=83,7 FPS=31 https://arxiv.org/pdf/1611.08588v2.pdf
1) When the FPS is low and the model is accurate, is there anyway to achieve a higher speed? is there any other hardware to perform faster than GPU?
2) Where you got the Pascal Voc 2012 result?
3) Does the memory of the GPU influence the model accuracy on training (typically we have to adjust the batch sizes to applicable with GPUs with lower memory sizes)
Also, have you heard about YOLO9000?
There was a chart on your previous posts about the competition results but I can not see that Image now. can you upload it again or mention the source?
@VanitarNordic All on nVidia Titan X (GM200)
Figure 4: https://arxiv.org/pdf/1612.08242v1.pdf
Got from many articles: https://drive.google.com/file/d/0BwRgzHpNbsWBTk13bHRnMWFEdVU/view
Hello,
How can I get coordinate information (x, y) of detected object(s)?
How can I train the Yolo2 for my own desired objects?