Open visionNoob opened 6 years ago
Hi,
Just train your classification network classifier.cfg
and classifier.weights
:
darknet.exe classifier train cfg/imagenet1k.data cfg/classifier.cfg
Do partial: darknet.exe partial cfg/classifier.cfg classifier.weights classifier.65 65
where instead of 65 - use the number of layers that you want to leave
Train detector: darknet.exe detector train data/obj.data cfg/yolo_obj.cfg classifier.65
oh! thanks may i have some additional question? hm.. can i train W with some freezing weight?
To freez all layers from 0 to N use stopbackward=1
in the N-layer in your cfg-file.
About fine-tuning: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
to speedup training (with decreasing detection accuracy) do Fine-Tuning instead of Transfer-Learning, set param stopbackward=1 in one of the penultimate convolutional layers before the 1-st [yolo]-layer, for example here: https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L598
Hi Insurgent and Alexey,
I was looking around the issue set for figuring out how to train for one or two custom items added to coco set without need to retrain the whole of coco set again. I stubmled on this post.
Apologies in advance if these are dumb question,
1) essentially why do we have two seperate networks for classifier and detection. 2) What do we mean by pre-trained weights ? This is same as pre-trained models isn't it (I mean for having our pretrained weights for such an architecture you would rather have to train the whole network with same architecture/model). How that be different from normal training ? 3) Training W with freezing weights ? What does this mean. When is this normally used ? Essentially we are talking about not having back propogation at all from some layer downwards , is it ? 4) Are these choices based on the amount of data you have and decision to train from scratch or just train few last layers?
@Vaquitta We train classifier before - only because ImageNet Classification-dataset has much more images than in Detection-dataset (PascalVOC, COCO, your).
After training Classification model, we just copy first N layers (i.e. remove last convolutional layer):
./darknet partial cfg/darknet19_448.cfg darknet19_448.weights darknet19_448.conv.23 23
And then train our Detection model using this first N pre-trained convolutional layers:
./darknet detector train cfg/voc.data cfg/yolov2-voc.cfg darknet19_448.conv.23
@Vaquitta There are no dumb questions in the world :D. but I ALWAYS ask dumb questions to AlexeyAB. He is so kind. @AlexeyAB Anyway. I have a few answers about your questions (just my opinion)
The reason we have two networks (one for classification, the other for detection) is because "# of classification data in the world >>>> # of detection data in the world" We first construct our network in "classification problem" then it can be trained with classification data. After train. We transform our "classification network" to "detection network". In this situation, There are just minor changes in transforming and we can "reuse" weights which are trained before("pre-trained") in classification training phase.
If you take these strategies, You don't need to train your detection network "from scratch" but just "fine tune" with your "pretrained weights" from classification train.
And in practice, Freezing weights is not common, i think. But it is meaningful in academic experiments. Usually If you want to freeze some layer and weights, it mean there are no more backpropagations from here. so freezed weights are no more trained. these weights should be trained beforehand.
Thanks Alexey & Insurgent for the responses (And re-assurance :P ). Appreciate the time taken to answer the queries. That re-assurance has let the flood gates open :)
Basically, with my limited understanding, i had always understood detection as classification + localisation problem, so technically for every detection problem i assume classification is a pre-req.
So technically what defines the detection would be how to localise the above threshold items, i would assume. This technically would be more of network agnostic isnt it, i mean we pretty much handle these detection locations in code rather than network architecture isnt.
Since i wanted to see how Yolo differentiate between the classiffier and detection i had a look at the original repository and this forked one.
I believe the segmenter in original repo maps to the classifier in this forked version. So in the original version the loading of network and weights and all are same for both options only difference seems after the network_predict is called. In case of detector it goes on to figure out the bounding boxes.
1) However in the forked version i could see that there is a "fuse_conv_batchnorm" being used instead of set_batch_network. Why is this the case ?
2) So what i am trying to figure out is how can we really separate classification from detection, so technically classification has to be a subset activity of detection isnt it. What that would mean is a dataset is fit for classification and detection and having one set of data in isolation doesn't make sense ? I did not see any such separate dataset for classiffication and for detection ? So tehnically same applies for network also a network that is good for classification would also be good for detection isnt it ?
3) Moving on with this logic, i am assuming you could only do detection of an object which the classifier understands (Rather has been trained in classifier)
4) The fine tuning would make good sense if actually we can detect objects those are not trained in classiffier and only trained during detection fine-tuning. But in that case also i would assume it would make sense when we have overlapping features of objects from what is available in classifier classes and the non existent classes(in classiffier training set) for detection... And technically when we fine tune for detection automatically the classiffication also now understand these new objects isnt it ? (So basically some kind of hierarchical feature based classiffiers that can be stacked to form some complex detection may be :P )
5) In the yolo config case we are having the same config for the Classiffier & Detection isnt it ? Would it be possible for you to share separate such networks in context of yolo for me to better understand.
@Vaquitta
network net = parse_network_cfg_custom(cfgfile, 1);
is used instead of 2 lines network net = parse_network_cfg(cfgfile);
and set_batch_network(&net, 1);
. Because if I use low-end mobile GPU that doesnt have enough GPU-memory for batch > 1
then this line will fail network net = parse_network_cfg(cfgfile);
Line fuse_conv_batchnorm(net);
is used to speedup detection by 7% by fusing 2 layers into 1: Convolutional + Batch-norm: https://github.com/AlexeyAB/darknet/issues/529#issuecomment-377204382
No. Only part of the networks is the same.
Hi! i have two networks. one is for classfier. the other is for detection. When I train a classifer, and i wanna use weights of classifier for pretraines weight for detection. it is similar to provided pretrained model but i wannt to create my own pretrained weight in imagenet.