Closed kingvision closed 7 years ago
@kingvision
darknet.exe detector train data/obj.data yolo-obj.cfg darknet19_448.conv.23
darknet.exe detector train data/obj.data yolo-obj.cfg yolo-obj_1000.weights
https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data
@AlexeyAB
For Fine-tuning we should also rename the last fully connected layer (in the Caffe the name is "InnerProduct" layer). if we don't do, it will pick up the weights from the training model. Have you done that?
@VanitarNordic Generally you do not need to change names of layers to fine tuning in Yolo v2 or in Caffe.
In Yolo v2, without changing the program C-code, you can't simply fine-tune only one layer without fine-tuning other layres, and you can't change weights in one layer of pre-trained weights to random values.
In Yolo v2 you can fine-tune only all layers and with initial weights of the pre-trained model.
learning_rate
for all layers: https://github.com/AlexeyAB/darknet/blob/2fc5f6d46b089368d967b3e1ad6b2473b6dc970e/cfg/yolo-voc.cfg#L14For Caffe, not for Yolo: If you didn't change "InnerProduct" layer in Caffe then "InnerProduct" layer and all others layers still be changed by fine-tuning: ./caffe train -solver ./solver.prototxt -weights ./pre_trained.caffemodel -gpu 0
2.1. You should change name of fully connected "InnerProduct" layer or any other layers in Caffe only if you want to use random weight initialization to these layers instead of weights from pre-trained model: https://docs.google.com/presentation/d/1lzyXMRQFlOYE2Jy0lCNaqltpcCIKuRzKJxQ7vCuPRc8/edit#slide=id.g3c48e0e3ae4e944415
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
Because we are predicting 20 classes instead of a 1,000, we do need to change the last layer in the model. Therefore, we change the name of the last layer from fc8 to fc8_flickr in our prototxt. Since there is no layer named that in the bvlc_reference_caffenet, that layer will begin training with random weights.
2.2. Also you can fine tune only one layer without fine-tuning other layers, by set learning rate
to 0 for layers that will not be trained
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
Note that we could also entirely prevent fine-tuning of all layers other than fc8_flickr by setting their lr_mult to 0.
2.3. But if you didn't change names of any layers and didn't set learning rate
to 0, then it is still fine-tuning if you use: ./caffe train -solver ./solver.prototxt -weights ./pre_trained.caffemodel -gpu 0
Yolo v2 has not fully connected "InnerProduct" layer: https://arxiv.org/pdf/1612.08242.pdf
We remove the fully connected layers from YOLO and use anchor boxes to predict bounding boxes.
switching to a fully convolutional network with anchor boxes and using the new network. Switching to the anchor box style approach increased recall without changing mAP
Yolo v2 is fully convolutional network, because Region Based Fully Convolutional Network (RFCN) is the best approach at this moment with the highest mAP: https://arxiv.org/pdf/1605.06409.pdf
R-FCN approach with ResNet network:
Generally there are two types of trainings:
Pre-training - is any training with non-labeled images by using: Autoencoders, RBM or DATA-DEPENDENT INITIALIZATIONS: https://arxiv.org/pdf/1511.06856.pdf
(It is mainly changes the lower layers)
Fine tuning - is any training with labeled images by using Backpropagation with stochastic gradient descent (SGD)
(It is changes all layers, but slower)
@AlexeyAB Thank you * 1000.
Because we talked about the Caffe, let me ask you this question. Assume that we have a trained .caffemodel
and a deploy.prototxt
files. How we can use these in an standalone application which could work everywhere? Without forcing the end-user to install Caffe (specially compiling it)? Usually programmer computer and end user computer will use different hardware, but the same configurations, such as Cuda, python, CuDNN and the rest.
I have some open issues also, I appreciate if I could find answer for them by you sir.
@VanitarNordic
As you known, there are 3 best approaches on Caffe for object detection from Pareto-frontier: R-FCN Caffe, PVANet Caffe, SSD Caffe. Only last two are real-time.
For example in SSD Caffe-fork: https://github.com/weiliu89/caffe/tree/ssd
ssd_detect.cpp
-file: https://github.com/weiliu89/caffe/blob/ssd/examples/ssd/ssd_detect.cppmain
-function instead of this: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/examples/ssd/ssd_detect.cpp#L244#include ""
in this cpp-filemake
as usually for build Caffe-SSD in this directory: https://github.com/weiliu89/caffe/tree/ssd
and ssd_detect.cpp
will be built too, runable binary file ssd_detect
will be in the directory: caffe/build/examples/ssd/
.cpp
files near with ssd_detect.cpp
by path caffe/examples/ssd/
and they will be compiled and linked too, as any .cpp
-files: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/Makefile#L29Detector
-class, or many of Detector
-classes if your GPU has enough RAM or if you use many GPUs: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/examples/ssd/ssd_detect.cpp#L272Caffe::SetDevice(dev_id);
before each detector was created, for example: std::vector<Detector> detectors;
for(size_t i = 0; i < number_of_gpus; ++i) {
Caffe::SetDevice(i);
Detector detector_tmp(model_file, weights_file, mean_file, mean_value);
detectors.push_back( detector_tmp );
}
Caffe::SetDevice(dev_id);
every time before detectors is used. Or you can create CPU-threads one for each GPUs, then call once Caffe::SetDevice(dev_id);
in each CPU-thread and use specified GPU in specified CPU-thread.Later I will add some changes for easier use darknet Yolo v2 in your .cpp
-programms. This is related to your other question: https://github.com/AlexeyAB/darknet/issues/21
Yolo v2
Caffe:
has more mandatory dependecies: Boost, google logging, google flag, protobuf, hdf5: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/Makefile#L181
and Caffe has more additional dependecies: CUDA (cuDNN), OpenCV, Python, lmdb, leveldb, snappy, mkl/openblas/cblas/atlas
@AlexeyAB
Thumbs up. exactly. let me tell you something. after my experiments I thought Yolo v2 is the best in all aspects. Competition results doesn't matter because we can make our own model better and better by back propagation (including tricky images, the ones that cause false positive and false negative detection) inside training. I'm impatiently waiting for that code modification you promised. This could be used as code or DLL files or something like that. The only open issue remain was knowing the mAP and accuracy after each iteration. Now we have the recall I think. correct me if I am wrong.
and also is there any good non-Caffe object detector except than Yolo?
@VanitarNordic
Yes, during training we can see:
loss (error)
for the current batch recall
and avg_iou (intersection of unions)
for the current subdivision of batchBut to see mean average precision (mAP), we should validate net in the whole valid or test image-dataset to calculate average.
For example, output of training:
Region Avg IOU: 0.800875, Class: 0.997764, Obj: 0.751350, No Obj: 0.004822, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.798296, Class: 0.997726, Obj: 0.753631, No Obj: 0.004813, Avg Recall: 1.000000, count: 8 9001: 0.043959, 0.043959 avg, 0.001000 rate, 3.566000 seconds, 576064 images Loaded: 0.000000 seconds Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds
For the current batch: (number of images = batch=16
in yolo-voc.cfg)
9002
0.211667
0.060730 avg
calculated as avg_loss = avg_loss*.9 + loss*.1;
For the current subdivison of batch: (number of images = batch=16
/ subdivisions=2
in yolo-voc.cfg)
count: 8
Region Avg IOU: 0.800677
= sum of iou;
and devided by count
1.000000
calculated as if(iou > .5) recall += 1;
@VanitarNordic
There are no other approaches close in accuracy to: Caffe (R-FCN, PVANet, SSD, Faster-RCNN, ...) or Darknet-Yolo v2.
And others DNN-frameworks usually used only for researchers or mathematicians and more comfortable for them, but not for production (not for end-user): Tensorflow, Torch, Theano, Matlab.
But there is one research approach XNOR-Net written in Lua and based on Torch, which much more faster than others and can be used on any low-performance devices, but with some decrease of precision. Note XNOR-Net from the same authors as the Darknet Yolo:
This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings
@AlexeyAB
Thank you very much again * 10000.
So by this information, when I should end the training? You mentioned usually 2000 iterations per class. Does avg_loss
could be considered as an indication?
About the Caffe, what about if we use the Windows distribution of the Caffe and use the cafe.dll
, does it solve the requirement of its dependencies?
It may work by copy-paste any of this (https://github.com/BVLC/caffe/tree/windows) on the target computer maybe.
@VanitarNordic
So by this information, when I should end the training? You mentioned usually 2000 iterations per class. Does avg_loss could be considered as an indication?
Yes, you can use avg_loss
as an indicator, very approximatelly mAP = 1 - avg_loss
Also to more accurate, you can change this line: https://github.com/AlexeyAB/darknet/blob/2a9a5229c87c5e05c87d9d792c62cf020b3f1981/src/detector.c#L136
avg_loss = avg_loss*.9 + loss*.1;
avg_loss = avg_loss*.99 + loss*.01;
avg_loss marked in example of output with postfix avg for each batch:
Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds
After stop your training when the error avg_loss
is small enough, you should try to use last weights from backup-folder and several previous (because avg_loss can still decreases on training dataset, but on validation dataset avg_loss already can begin to growth - due to the overfitting): https://github.com/AlexeyAB/darknet/issues/20#issuecomment-277525791
About the Caffe, what about if we use the Windows distribution of the Caffe and use the cafe.dll, does it solve the requirement of its dependencies? It may work by copy-paste any of this (https://github.com/BVLC/caffe/tree/windows) on the target computer maybe.
I didn't try to use Caffe on Windows. Maybe you will be able to easily use the Caffe-fork to which you referred.
And Caffe-PVANet too:
uses its own PVA Caffe-fork: https://github.com/sanghoon/caffe/tree/6068dd04ea93cca9fcee036628fdb3ea95b4ebcd
This repository is a fork from BVLC/caffe. Some modifications have been made to run PVANET with Caffe
and PVANet-code on Python: https://github.com/sanghoon/pva-faster-rcnn
This version of py-faster-rcnn is slower than our in-house runtime code (e.g. image pre-processing code written in Python)
Later I will add some changes for easier use darknet Yolo v2 in your .cpp-programms.
Please have a look at this repository (https://github.com/mrzl/ofxDarknet). It seems that this man has made something that gets cfg, weights and other files as input and makes detection inside C++ program. Its description was not straightforward for me, which may take a lot of time to decipher the code but for you may take just a few minutes and "maybe" satisfy our needs to make portable/standalone GUI applications.
Actually I want to code the software using VB.Net, first thought was reading the output from Darknet.exe
CMD console, but it is not a straightforward task.
@VanitarNordic
I added support to use Yolo v2 as C++ yolo_cpp_dll.dll
-file: https://github.com/AlexeyAB/darknet#how-to-use-yolo-as-dll
You simple should build build\darknet\yolo_cpp_dll.sln
to create yolo_cpp_dll.dll
-file
And build simple C++-example which uses this yolo_cpp_dll.dll
- open & build build\darknet\yolo_console_dll.sln
.cpp
-example is here: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cppClass Detector
has constructor and 3 detect()
-functions:
std::vector<bbox_t> detect(std::string image_filename, float thresh = 0.2);
- takes the image file namestd::vector<bbox_t> detect(image_t img, float thresh = 0.2);
- takes already loaded image image_t
std::vector<bbox_t> detect(cv::Mat mat, float thresh = 0.2);
- takes already loaded image of type cv::Mat
by using OpenCV-function cv::imread(filename);
class Detector {
public:
Detector(std::string cfg_filename, std::string weight_filename, int gpu_id = 0);
~Detector();
std::vector<bbox_t> detect(std::string image_filename, float thresh = 0.2);
std::vector<bbox_t> detect(image_t img, float thresh = 0.2);
#ifdef OPENCV
std::vector<bbox_t> detect(cv::Mat mat, float thresh = 0.2);
#endif
};
hi how i can fine tuning the model? thanks