fine tuning - Githubissues

kingvision commented 7 years ago

hi how i can fine tuning the model? thanks

AlexeyAB commented 7 years ago

@kingvision

To fine tune convoltional layers darknet19_448.conv.23 pre-trained on ImageNet, use: darknet.exe detector train data/obj.data yolo-obj.cfg darknet19_448.conv.23
To fine tune your pre-trained model yolo-obj_1000.weights, use: darknet.exe detector train data/obj.data yolo-obj.cfg yolo-obj_1000.weights

https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data

MyVanitar commented 7 years ago

@AlexeyAB

For Fine-tuning we should also rename the last fully connected layer (in the Caffe the name is "InnerProduct" layer). if we don't do, it will pick up the weights from the training model. Have you done that?

AlexeyAB commented 7 years ago

@VanitarNordic Generally you do not need to change names of layers to fine tuning in Yolo v2 or in Caffe.

In Yolo v2, without changing the program C-code, you can't simply fine-tune only one layer without fine-tuning other layres, and you can't change weights in one layer of pre-trained weights to random values.

In Yolo v2 you can fine-tune only all layers and with initial weights of the pre-trained model.
- Yolo v2 model has only global learning_rate for all layers: https://github.com/AlexeyAB/darknet/blob/2fc5f6d46b089368d967b3e1ad6b2473b6dc970e/cfg/yolo-voc.cfg#L14
- Yolo v2 model has not names of layers.

For Caffe, not for Yolo: If you didn't change "InnerProduct" layer in Caffe then "InnerProduct" layer and all others layers still be changed by fine-tuning: ./caffe train -solver ./solver.prototxt -weights ./pre_trained.caffemodel -gpu 0

2.1. You should change name of fully connected "InnerProduct" layer or any other layers in Caffe only if you want to use random weight initialization to these layers instead of weights from pre-trained model: https://docs.google.com/presentation/d/1lzyXMRQFlOYE2Jy0lCNaqltpcCIKuRzKJxQ7vCuPRc8/edit#slide=id.g3c48e0e3ae4e944415

http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

Because we are predicting 20 classes instead of a 1,000, we do need to change the last layer in the model. Therefore, we change the name of the last layer from fc8 to fc8_flickr in our prototxt. Since there is no layer named that in the bvlc_reference_caffenet, that layer will begin training with random weights.

2.2. Also you can fine tune only one layer without fine-tuning other layers, by set learning rate to 0 for layers that will not be trained http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

Note that we could also entirely prevent fine-tuning of all layers other than fc8_flickr by setting their lr_mult to 0.

2.3. But if you didn't change names of any layers and didn't set learning rate to 0, then it is still fine-tuning if you use: ./caffe train -solver ./solver.prototxt -weights ./pre_trained.caffemodel -gpu 0
Yolo v2 has not fully connected "InnerProduct" layer: https://arxiv.org/pdf/1612.08242.pdf

We remove the fully connected layers from YOLO and use anchor boxes to predict bounding boxes.

switching to a fully convolutional network with anchor boxes and using the new network. Switching to the anchor box style approach increased recall without changing mAP

Yolo v2 is fully convolutional network, because Region Based Fully Convolutional Network (RFCN) is the best approach at this moment with the highest mAP: https://arxiv.org/pdf/1605.06409.pdf

R-FCN approach with ResNet network:
- Pascal VOC2012 (train on own data) - Top1, the best R-FCN: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4
- Pascal VOC2012 (train on VOC2012 data) - Top1, the best fylly convolutional network Yolo v2: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=3
Generally there are two types of trainings:

Pre-training - is any training with non-labeled images by using: Autoencoders, RBM or DATA-DEPENDENT INITIALIZATIONS: https://arxiv.org/pdf/1511.06856.pdf

(It is mainly changes the lower layers)
Fine tuning - is any training with labeled images by using Backpropagation with stochastic gradient descent (SGD)

(It is changes all layers, but slower)

MyVanitar commented 7 years ago

@AlexeyAB Thank you * 1000.

Because we talked about the Caffe, let me ask you this question. Assume that we have a trained .caffemodeland a deploy.prototxtfiles. How we can use these in an standalone application which could work everywhere? Without forcing the end-user to install Caffe (specially compiling it)? Usually programmer computer and end user computer will use different hardware, but the same configurations, such as Cuda, python, CuDNN and the rest.

I have some open issues also, I appreciate if I could find answer for them by you sir.

AlexeyAB commented 7 years ago

@VanitarNordic

As you known, there are 3 best approaches on Caffe for object detection from Pareto-frontier: R-FCN Caffe, PVANet Caffe, SSD Caffe. Only last two are real-time.

For example in SSD Caffe-fork: https://github.com/weiliu89/caffe/tree/ssd

you can simply change this ssd_detect.cpp-file: https://github.com/weiliu89/caffe/blob/ssd/examples/ssd/ssd_detect.cpp
you can write your own main-function instead of this: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/examples/ssd/ssd_detect.cpp#L244
and you can add any header files as #include "" in this cpp-file
after that, you can simple do make as usually for build Caffe-SSD in this directory: https://github.com/weiliu89/caffe/tree/ssd and ssd_detect.cpp will be built too, runable binary file ssd_detect will be in the directory: caffe/build/examples/ssd/
you can also put you additional .cpp files near with ssd_detect.cpp by path caffe/examples/ssd/ and they will be compiled and linked too, as any .cpp-files: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/Makefile#L29
also you can add any additional libraries to link it, here: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/Makefile#L181

You can use one Detector-class, or many of Detector-classes if your GPU has enough RAM or if you use many GPUs: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/examples/ssd/ssd_detect.cpp#L272
if you use many GPUs, then you should Caffe::SetDevice(dev_id); before each detector was created, for example:

        std::vector<Detector> detectors;

        for(size_t i = 0; i < number_of_gpus; ++i) {
            Caffe::SetDevice(i);
            Detector detector_tmp(model_file, weights_file, mean_file, mean_value);
            detectors.push_back( detector_tmp );
        }

And then, you should call Caffe::SetDevice(dev_id); every time before detectors is used. Or you can create CPU-threads one for each GPUs, then call once Caffe::SetDevice(dev_id); in each CPU-thread and use specified GPU in specified CPU-thread.

Later I will add some changes for easier use darknet Yolo v2 in your .cpp-programms. This is related to your other question: https://github.com/AlexeyAB/darknet/issues/21

Yolo v2
- has only one mandatory dependency: pthread
- and 3 additional dependecies for speedup:
  - CUDA
  - cuDNN
  - OpenCV
Caffe:
- has more mandatory dependecies: Boost, google logging, google flag, protobuf, hdf5: https://github.com/weiliu89/caffe/blob/fd0ba25cd0593fd336ad5e9c47fafd702be67806/Makefile#L181
- and Caffe has more additional dependecies: CUDA (cuDNN), OpenCV, Python, lmdb, leveldb, snappy, mkl/openblas/cblas/atlas

MyVanitar commented 7 years ago

@AlexeyAB

Thumbs up. exactly. let me tell you something. after my experiments I thought Yolo v2 is the best in all aspects. Competition results doesn't matter because we can make our own model better and better by back propagation (including tricky images, the ones that cause false positive and false negative detection) inside training. I'm impatiently waiting for that code modification you promised. This could be used as code or DLL files or something like that. The only open issue remain was knowing the mAP and accuracy after each iteration. Now we have the recall I think. correct me if I am wrong.

MyVanitar commented 7 years ago

and also is there any good non-Caffe object detector except than Yolo?

AlexeyAB commented 7 years ago

@VanitarNordic

Yes, during training we can see:

only loss (error) for the current batch
or recall and avg_iou (intersection of unions) for the current subdivision of batch

But to see mean average precision (mAP), we should validate net in the whole valid or test image-dataset to calculate average.

For example, output of training:

Region Avg IOU: 0.800875, Class: 0.997764, Obj: 0.751350, No Obj: 0.004822, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.798296, Class: 0.997726, Obj: 0.753631, No Obj: 0.004813, Avg Recall: 1.000000, count: 8 9001: 0.043959, 0.043959 avg, 0.001000 rate, 3.566000 seconds, 576064 images Loaded: 0.000000 seconds Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds

For the current batch: (number of images = batch=16 in yolo-voc.cfg)
- iteration number (number of batch) = 9002
- loss (error) for this batch = 0.211667
- avg_loss (avg error) = 0.060730 avg calculated as avg_loss = avg_loss*.9 + loss*.1;
For the current subdivison of batch: (number of images = batch=16 / subdivisions=2 in yolo-voc.cfg)
- count of images in current subdivision = count: 8
- avg_iou (intersection of unions) = Region Avg IOU: 0.800677 = sum of iou; and devided by count
- Recall (% of found objects) = 1.000000 calculated as if(iou > .5) recall += 1;

precisionrecall svg

iou_equation

AlexeyAB commented 7 years ago

@VanitarNordic

There are no other approaches close in accuracy to: Caffe (R-FCN, PVANet, SSD, Faster-RCNN, ...) or Darknet-Yolo v2.

And others DNN-frameworks usually used only for researchers or mathematicians and more comfortable for them, but not for production (not for end-user): Tensorflow, Torch, Theano, Matlab.

But there is one research approach XNOR-Net written in Lua and based on Torch, which much more faster than others and can be used on any low-performance devices, but with some decrease of precision. Note XNOR-Net from the same authors as the Darknet Yolo:

This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings

plato_xnor

MyVanitar commented 7 years ago

@AlexeyAB

Thank you very much again * 10000.

So by this information, when I should end the training? You mentioned usually 2000 iterations per class. Does avg_loss could be considered as an indication?
About the Caffe, what about if we use the Windows distribution of the Caffe and use the cafe.dll, does it solve the requirement of its dependencies? It may work by copy-paste any of this (https://github.com/BVLC/caffe/tree/windows) on the target computer maybe.

AlexeyAB commented 7 years ago

@VanitarNordic

About mAP

So by this information, when I should end the training? You mentioned usually 2000 iterations per class. Does avg_loss could be considered as an indication?

Yes, you can use avg_loss as an indicator, very approximatelly mAP = 1 - avg_loss

Also to more accurate, you can change this line: https://github.com/AlexeyAB/darknet/blob/2a9a5229c87c5e05c87d9d792c62cf020b3f1981/src/detector.c#L136

from: avg_loss = avg_loss*.9 + loss*.1;
to: avg_loss = avg_loss*.99 + loss*.01;

avg_loss marked in example of output with postfix avg for each batch:

Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds

After stop your training when the error avg_loss is small enough, you should try to use last weights from backup-folder and several previous (because avg_loss can still decreases on training dataset, but on validation dataset avg_loss already can begin to growth - due to the overfitting): https://github.com/AlexeyAB/darknet/issues/20#issuecomment-277525791

About Caffe

About the Caffe, what about if we use the Windows distribution of the Caffe and use the cafe.dll, does it solve the requirement of its dependencies? It may work by copy-paste any of this (https://github.com/BVLC/caffe/tree/windows) on the target computer maybe.

I didn't try to use Caffe on Windows. Maybe you will be able to easily use the Caffe-fork to which you referred.

But Caffe-SSD uses its own Caffe-fork: https://github.com/weiliu89/caffe/tree/ssd
And Caffe-PVANet too:
- uses its own PVA Caffe-fork: https://github.com/sanghoon/caffe/tree/6068dd04ea93cca9fcee036628fdb3ea95b4ebcd
  
  This repository is a fork from BVLC/caffe. Some modifications have been made to run PVANET with Caffe
- and PVANet-code on Python: https://github.com/sanghoon/pva-faster-rcnn
This version of py-faster-rcnn is slower than our in-house runtime code (e.g. image pre-processing code written in Python)

MyVanitar commented 7 years ago

Later I will add some changes for easier use darknet Yolo v2 in your .cpp-programms.

Please have a look at this repository (https://github.com/mrzl/ofxDarknet). It seems that this man has made something that gets cfg, weights and other files as input and makes detection inside C++ program. Its description was not straightforward for me, which may take a lot of time to decipher the code but for you may take just a few minutes and "maybe" satisfy our needs to make portable/standalone GUI applications.

Actually I want to code the software using VB.Net, first thought was reading the output from Darknet.exeCMD console, but it is not a straightforward task.

AlexeyAB commented 7 years ago

@VanitarNordic

I added support to use Yolo v2 as C++ yolo_cpp_dll.dll-file: https://github.com/AlexeyAB/darknet#how-to-use-yolo-as-dll

You simple should build build\darknet\yolo_cpp_dll.sln to create yolo_cpp_dll.dll-file
And build simple C++-example which uses this yolo_cpp_dll.dll - open & build build\darknet\yolo_console_dll.sln
- .cpp-example is here: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp
- it supports both usage: console (without OpenCV) & GUI (with OpenCV)

Class Detector has constructor and 3 detect()-functions:

std::vector<bbox_t> detect(std::string image_filename, float thresh = 0.2); - takes the image file name
std::vector<bbox_t> detect(image_t img, float thresh = 0.2); - takes already loaded image image_t
std::vector<bbox_t> detect(cv::Mat mat, float thresh = 0.2); - takes already loaded image of type cv::Mat by using OpenCV-function cv::imread(filename);

class Detector {
public:
    Detector(std::string cfg_filename, std::string weight_filename, int gpu_id = 0);
    ~Detector();

    std::vector<bbox_t> detect(std::string image_filename, float thresh = 0.2);
    std::vector<bbox_t> detect(image_t img, float thresh = 0.2);

#ifdef OPENCV
    std::vector<bbox_t> detect(cv::Mat mat, float thresh = 0.2);
#endif
};

AlexeyAB commented 7 years ago

console_yolo_dll

AlexeyAB / darknet

fine tuning #6