DaHoC / trainHOG

Example program showing how to train your custom HOG detector using openCV
221 stars 109 forks source link

Example of Linear SVM in Opencv #9

Open mrgloom opened 9 years ago

mrgloom commented 9 years ago

If anyone intersted there is an example of using linear svm + hog descriptor in opencv, but opencv uses libsvm and using liblinear will be faster.

https://github.com/Itseez/opencv/blob/ddf82d0b154873510802ef75c53e628cd7b2cb13/samples/cpp/train_HOG.cpp

https://www.csie.ntu.edu.tw/~cjlin/liblinear/

ghost commented 9 years ago

Have you succeeded training your classifier accordingly ? If so, what data and pos/neg ratio did you used ? Thank you

mrgloom commented 9 years ago

My work in progress.

I think ratio between pos and neg should be 1:1 if you don't use SVM with weights.

Look here for example: http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html

But then you should perform several iterations of "hard negative mining" to get hard negatives and then you should add (maybe with replacement?) these examples to your training set and retrain classifier.

if you like python you can try this: https://github.com/bikz05/object-detector

ghost commented 9 years ago

1) Indeed, in Dalal & Triggs 2005 paper, they call them "Hard Examples":

"For each detector and parameter combination a preliminary detector is trained and the 1218 negative training photos are searched exhaustively for false positives (‘hard examples’). The method is then re-trained using this augmented set (initial 12180 + hard examples) to produce the final detector. The set of hard examples is subsampled if necessary, so that the descriptors of the final training set fit into 1.7 Gb of RAM for SVM training." page 2, Datasets & Methodology.

So there is no replacement, we have to keep the original 12180 negatives.

However, I am curious about what they are implying by "parameter combination", do we need to change the training parameters regarding the output of one training iteration ?

2) By the way, I have used this opencv sample, and the detection works quite well but with a LOT of false positives...(Must try these hard examples!). Since the weights is computed here using 96 by 160 windows, we are detecting pedestrians with a 7524 sized buffer (compared to the traditional 3780). In the "Negatives Windows" section in [1], we must sort of use only the centered data (without the 16 pixels margins), therefore I think we should try to shrink the 7524 output weights in order to get just the ones matching the 64*128 window. Do you agree ?

[1] http://pascal.inrialpes.fr/data/human/

Cheers,

mrgloom commented 9 years ago

I think they just mean crossvalidation over parameters http://scikit-learn.org/stable/modules/grid_search.html

Also as I'm understand if we will use more negative data we will have unbalanced dataset and our confusion matrix will be skewed, anyway we can just copy original positives(this called oversampling) only limitation here is processing time and memory consumption or randomly subsample from negative data(this called undersampling) but in this case we loose some of negative data , I don't know if this is important.

ghost commented 9 years ago

Yeah, memory is kind of a concern ...

Any idea about 2) ?

mrgloom commented 9 years ago

I don't understand your question. What is 7524 sized buffer ? HOG feature vector size? Do you mean that 64*128 window produce feature vector of size 3780 and 96x160 feature vector of size 7524?

About false positives: you can try to vary SVM threshold(in OpenCV you can use returnDFVal flag to get distance to hyperplane returnDFVal – Specifies a type of the return value. If true and the problem is 2-class classification then the method returns the decision function value that is signed distance to the margin, else the function returns a class label (classification) or estimated function value (regression).) it give you tradeoff between number of false positives/false negatives, etc.

Also try to plot ROC curve to get understanding if it works in similar way as in original paper. https://en.wikipedia.org/wiki/Receiver_operating_characteristic

ghost commented 9 years ago

Yeah, that's what I meant by 7524 sized buffer. IMHO, cropping it is mathematically correct, but I am wondering whether it assures the correctness of the classification or not, since the bias has been generated using 96 by 160 training windows.

ROC plot is a great idea.