AITTSMD / MTCNN-Tensorflow

Reproduce MTCNN using Tensorflow
1.51k stars 710 forks source link

[Disscussion]How to imporove pnet recall #39

Open flankechen opened 6 years ago

flankechen commented 6 years ago

Thanks very much for the work! I am trying something interesting, MTCNN for cat heads! Unfortunately, I could only have 10,000 labeled cat head example, which is much less than human face datasets. I divide the data set to 80% training and 20% testing/validation. For pnet, the recall on validation set is around 91%, with the threshold 0.9. I change the number in gen_12net_data.py for more pos/part/neg samples, but is not helping. Question, how to improve pnet recall. for my problem. cat heads share much larger variance in both shape and texture. I am thinking about a larger deeper net work. any discussion is welcome!

AITTSMD commented 6 years ago

Cat‘s head detection? Just detect cat's head rather than the entire car?That is interesting. I think you can do the following things: 1 label more data. 2 do more data augmentation(rotation,flip,mirror,add noise,sda...) 3 use larger network. 4 decrease classification threshold。

flankechen commented 6 years ago

@AITTSMD thanks for the reply.

  1. I try a bigger network, with 5 5 24 ---> 3 3 32 ---> 1 1 48, the recall improves, from 91% to 94%. some bigger network do helps.
  2. yes of cause, I need more data, augmented and more labeling work, probably pay for some one the to do this... In your experience, when should we stop tunning the recall for pnet. for a cascade way. the first pnet controls the recall for the whole MTCNN?
AITTSMD commented 6 years ago

@flankechen the 'P' in PNet means proposals.The function of PNet is rejecting bbox that is easy to classify.When the number of candidate boxes is small,the whole mtcnn can run fast.So it's very important for PNet to keep high recall rate.(Maybe > 97%).

flankechen commented 6 years ago

@AITTSMD I manage to tune a bigger pnet with threshold 0.8 resulting recall 97%. While the trained rnet resulting recall:92%, precession:97%, accuracy:89% at threshold 0.6. performance is going quite a lot for rnet. it's even worse for onet. is this normal in your experience and human face case? Can anyone show me some guideline for tunning rnet? BTW, with trained pnet, I got only 18,000 pos, 36,000 part, 360,000 neg data after gen_hard_example with 8,000 training sample. is this enough for a neural network like rnet to work? But how to augment data in gen_hard_example, which runs pnet on training set? augmenting the training set with a fake face_train_bbx_gt.txt?

KangolHsu commented 6 years ago

Is there any tips for suppress false positive rate in Pnet?(non-face detected with high score) As far as i know : 1.generate more negative face 2.use other datasets (eg:celeba,...)to generate negative face

Am i right?

liuyunwww commented 6 years ago

Are you successful in the detection of cat faces?

flankechen commented 6 years ago

@liuyunwww it work, and the performance is quit well, for my application. MTCNN is powerful and simple way of specific object detection and landmarking.

liuyunwww commented 6 years ago

@flankechen, I am trying MTCNN for dog heads. After running train_RNet.py, I continued with one_image_test.py. But the bbox was beyond the picture at the top-right corner.

I have changed two parameters, thresh: from [0.6, 0.6, 0.7] to [0.4, 0.05, 0.7] epoch: from [18, 14, 22] to [30, 22, 22]

no landmarks

The result was as the following:

Instructions for updating: Use the retry module or similar alternatives. (1, ?, ?, 3) (1, ?, ?, 10) (1, ?, ?, 10) (1, ?, ?, 16) (1, ?, ?, 32) (1, ?, ?, 2) (1, ?, ?, 4) (1, ?, ?, 10) ./data/MTCNN_model/PNet_landmark (256, 24, 24, 3) (256, 22, 22, 28) (256, 11, 11, 28) (256, 9, 9, 48) (256, 4, 4, 48) (256, 3, 3, 64) (256, 576) (256, 128) (256, 2) (256, 4) (256, 10) 0 images done [0.00011708] [ 2.93596156e+01 -6.52301640e+00 4.66642327e+02 4.61017590e+02 1.17080446e-04]

Can you tell me your experience?

flankechen commented 6 years ago

@liuyunwww I think you should check your training data, bbox input and the pnet output first.

liuyunwww commented 6 years ago

@flankechen bbox input - do you mean I should check bbox values in the training data? Should I divide the whole data to 80% training and 20% testing/validation, and then comptute the recall on validation data as you suggest above?

flankechen commented 6 years ago

@liuyunwww yes, the input of your training, draw the bbox in image to check. of cause, check Pnet performance first