AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.78k stars 7.96k forks source link

fine tuning #6

Closed kingvision closed 7 years ago

kingvision commented 7 years ago

hi how i can fine tuning the model? thanks

AlexeyAB commented 7 years ago

@kingvision

https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data

MyVanitar commented 7 years ago

@AlexeyAB

For Fine-tuning we should also rename the last fully connected layer (in the Caffe the name is "InnerProduct" layer). if we don't do, it will pick up the weights from the training model. Have you done that?

AlexeyAB commented 7 years ago

@VanitarNordic Generally you do not need to change names of layers to fine tuning in Yolo v2 or in Caffe.

  1. In Yolo v2, without changing the program C-code, you can't simply fine-tune only one layer without fine-tuning other layres, and you can't change weights in one layer of pre-trained weights to random values.

    In Yolo v2 you can fine-tune only all layers and with initial weights of the pre-trained model.


  1. For Caffe, not for Yolo: If you didn't change "InnerProduct" layer in Caffe then "InnerProduct" layer and all others layers still be changed by fine-tuning: ./caffe train -solver ./solver.prototxt -weights ./pre_trained.caffemodel -gpu 0

    2.1. You should change name of fully connected "InnerProduct" layer or any other layers in Caffe only if you want to use random weight initialization to these layers instead of weights from pre-trained model: https://docs.google.com/presentation/d/1lzyXMRQFlOYE2Jy0lCNaqltpcCIKuRzKJxQ7vCuPRc8/edit#slide=id.g3c48e0e3ae4e944415

    http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

    Because we are predicting 20 classes instead of a 1,000, we do need to change the last layer in the model. Therefore, we change the name of the last layer from fc8 to fc8_flickr in our prototxt. Since there is no layer named that in the bvlc_reference_caffenet, that layer will begin training with random weights.

    2.2. Also you can fine tune only one layer without fine-tuning other layers, by set learning rate to 0 for layers that will not be trained http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

    Note that we could also entirely prevent fine-tuning of all layers other than fc8_flickr by setting their lr_mult to 0.

    2.3. But if you didn't change names of any layers and didn't set learning rate to 0, then it is still fine-tuning if you use: ./caffe train -solver ./solver.prototxt -weights ./pre_trained.caffemodel -gpu 0

  2. Yolo v2 has not fully connected "InnerProduct" layer: https://arxiv.org/pdf/1612.08242.pdf

We remove the fully connected layers from YOLO and use anchor boxes to predict bounding boxes.

switching to a fully convolutional network with anchor boxes and using the new network. Switching to the anchor box style approach increased recall without changing mAP

  1. Yolo v2 is fully convolutional network, because Region Based Fully Convolutional Network (RFCN) is the best approach at this moment with the highest mAP: https://arxiv.org/pdf/1605.06409.pdf

    R-FCN approach with ResNet network:

  2. Generally there are two types of trainings:

MyVanitar commented 7 years ago

@AlexeyAB Thank you * 1000.

Because we talked about the Caffe, let me ask you this question. Assume that we have a trained .caffemodeland a deploy.prototxtfiles. How we can use these in an standalone application which could work everywhere? Without forcing the end-user to install Caffe (specially compiling it)? Usually programmer computer and end user computer will use different hardware, but the same configurations, such as Cuda, python, CuDNN and the rest.

I have some open issues also, I appreciate if I could find answer for them by you sir.

AlexeyAB commented 7 years ago

@VanitarNordic

As you known, there are 3 best approaches on Caffe for object detection from Pareto-frontier: R-FCN Caffe, PVANet Caffe, SSD Caffe. Only last two are real-time.

For example in SSD Caffe-fork: https://github.com/weiliu89/caffe/tree/ssd


        std::vector<Detector> detectors;

        for(size_t i = 0; i < number_of_gpus; ++i) {
            Caffe::SetDevice(i);
            Detector detector_tmp(model_file, weights_file, mean_file, mean_value);
            detectors.push_back( detector_tmp );
        } 

Later I will add some changes for easier use darknet Yolo v2 in your .cpp-programms. This is related to your other question: https://github.com/AlexeyAB/darknet/issues/21

  1. Yolo v2

  2. Caffe:

MyVanitar commented 7 years ago

@AlexeyAB

Thumbs up. exactly. let me tell you something. after my experiments I thought Yolo v2 is the best in all aspects. Competition results doesn't matter because we can make our own model better and better by back propagation (including tricky images, the ones that cause false positive and false negative detection) inside training. I'm impatiently waiting for that code modification you promised. This could be used as code or DLL files or something like that. The only open issue remain was knowing the mAP and accuracy after each iteration. Now we have the recall I think. correct me if I am wrong.

MyVanitar commented 7 years ago

and also is there any good non-Caffe object detector except than Yolo?

AlexeyAB commented 7 years ago

@VanitarNordic

Yes, during training we can see:

But to see mean average precision (mAP), we should validate net in the whole valid or test image-dataset to calculate average.

For example, output of training:

Region Avg IOU: 0.800875, Class: 0.997764, Obj: 0.751350, No Obj: 0.004822, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.798296, Class: 0.997726, Obj: 0.753631, No Obj: 0.004813, Avg Recall: 1.000000, count: 8 9001: 0.043959, 0.043959 avg, 0.001000 rate, 3.566000 seconds, 576064 images Loaded: 0.000000 seconds Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds

precisionrecall svg

iou_equation

AlexeyAB commented 7 years ago

@VanitarNordic

There are no other approaches close in accuracy to: Caffe (R-FCN, PVANet, SSD, Faster-RCNN, ...) or Darknet-Yolo v2.

And others DNN-frameworks usually used only for researchers or mathematicians and more comfortable for them, but not for production (not for end-user): Tensorflow, Torch, Theano, Matlab.

But there is one research approach XNOR-Net written in Lua and based on Torch, which much more faster than others and can be used on any low-performance devices, but with some decrease of precision. Note XNOR-Net from the same authors as the Darknet Yolo:

This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings

plato_xnor

MyVanitar commented 7 years ago

@AlexeyAB

Thank you very much again * 10000.

AlexeyAB commented 7 years ago

@VanitarNordic

  1. About mAP

So by this information, when I should end the training? You mentioned usually 2000 iterations per class. Does avg_loss could be considered as an indication?

Yes, you can use avg_loss as an indicator, very approximatelly mAP = 1 - avg_loss

Also to more accurate, you can change this line: https://github.com/AlexeyAB/darknet/blob/2a9a5229c87c5e05c87d9d792c62cf020b3f1981/src/detector.c#L136

avg_loss marked in example of output with postfix avg for each batch:

Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds

After stop your training when the error avg_loss is small enough, you should try to use last weights from backup-folder and several previous (because avg_loss can still decreases on training dataset, but on validation dataset avg_loss already can begin to growth - due to the overfitting): https://github.com/AlexeyAB/darknet/issues/20#issuecomment-277525791


  1. About Caffe

About the Caffe, what about if we use the Windows distribution of the Caffe and use the cafe.dll, does it solve the requirement of its dependencies? It may work by copy-paste any of this (https://github.com/BVLC/caffe/tree/windows) on the target computer maybe.

I didn't try to use Caffe on Windows. Maybe you will be able to easily use the Caffe-fork to which you referred.

MyVanitar commented 7 years ago

Later I will add some changes for easier use darknet Yolo v2 in your .cpp-programms.

Please have a look at this repository (https://github.com/mrzl/ofxDarknet). It seems that this man has made something that gets cfg, weights and other files as input and makes detection inside C++ program. Its description was not straightforward for me, which may take a lot of time to decipher the code but for you may take just a few minutes and "maybe" satisfy our needs to make portable/standalone GUI applications.

Actually I want to code the software using VB.Net, first thought was reading the output from Darknet.exeCMD console, but it is not a straightforward task.

AlexeyAB commented 7 years ago

@VanitarNordic

I added support to use Yolo v2 as C++ yolo_cpp_dll.dll-file: https://github.com/AlexeyAB/darknet#how-to-use-yolo-as-dll


Class Detector has constructor and 3 detect()-functions:

class Detector {
public:
    Detector(std::string cfg_filename, std::string weight_filename, int gpu_id = 0);
    ~Detector();

    std::vector<bbox_t> detect(std::string image_filename, float thresh = 0.2);
    std::vector<bbox_t> detect(image_t img, float thresh = 0.2);

#ifdef OPENCV
    std::vector<bbox_t> detect(cv::Mat mat, float thresh = 0.2);
#endif
};
AlexeyAB commented 7 years ago

console_yolo_dll