Open flankechen opened 6 years ago
Cat‘s head detection? Just detect cat's head rather than the entire car?That is interesting. I think you can do the following things: 1 label more data. 2 do more data augmentation(rotation,flip,mirror,add noise,sda...) 3 use larger network. 4 decrease classification threshold。
@AITTSMD thanks for the reply.
@flankechen the 'P' in PNet means proposals.The function of PNet is rejecting bbox that is easy to classify.When the number of candidate boxes is small,the whole mtcnn can run fast.So it's very important for PNet to keep high recall rate.(Maybe > 97%).
@AITTSMD I manage to tune a bigger pnet with threshold 0.8 resulting recall 97%. While the trained rnet resulting recall:92%, precession:97%, accuracy:89% at threshold 0.6. performance is going quite a lot for rnet. it's even worse for onet. is this normal in your experience and human face case? Can anyone show me some guideline for tunning rnet? BTW, with trained pnet, I got only 18,000 pos, 36,000 part, 360,000 neg data after gen_hard_example with 8,000 training sample. is this enough for a neural network like rnet to work? But how to augment data in gen_hard_example, which runs pnet on training set? augmenting the training set with a fake face_train_bbx_gt.txt?
Is there any tips for suppress false positive rate in Pnet?(non-face detected with high score) As far as i know : 1.generate more negative face 2.use other datasets (eg:celeba,...)to generate negative face
Am i right?
Are you successful in the detection of cat faces?
@liuyunwww it work, and the performance is quit well, for my application. MTCNN is powerful and simple way of specific object detection and landmarking.
@flankechen, I am trying MTCNN for dog heads. After running train_RNet.py, I continued with one_image_test.py. But the bbox was beyond the picture at the top-right corner.
I have changed two parameters, thresh: from [0.6, 0.6, 0.7] to [0.4, 0.05, 0.7] epoch: from [18, 14, 22] to [30, 22, 22]
no landmarks
The result was as the following:
Instructions for updating: Use the retry module or similar alternatives. (1, ?, ?, 3) (1, ?, ?, 10) (1, ?, ?, 10) (1, ?, ?, 16) (1, ?, ?, 32) (1, ?, ?, 2) (1, ?, ?, 4) (1, ?, ?, 10) ./data/MTCNN_model/PNet_landmark (256, 24, 24, 3) (256, 22, 22, 28) (256, 11, 11, 28) (256, 9, 9, 48) (256, 4, 4, 48) (256, 3, 3, 64) (256, 576) (256, 128) (256, 2) (256, 4) (256, 10) 0 images done [0.00011708] [ 2.93596156e+01 -6.52301640e+00 4.66642327e+02 4.61017590e+02 1.17080446e-04]
Can you tell me your experience?
@liuyunwww I think you should check your training data, bbox input and the pnet output first.
@flankechen bbox input - do you mean I should check bbox values in the training data? Should I divide the whole data to 80% training and 20% testing/validation, and then comptute the recall on validation data as you suggest above?
@liuyunwww yes, the input of your training, draw the bbox in image to check. of cause, check Pnet performance first
Thanks very much for the work! I am trying something interesting, MTCNN for cat heads! Unfortunately, I could only have 10,000 labeled cat head example, which is much less than human face datasets. I divide the data set to 80% training and 20% testing/validation. For pnet, the recall on validation set is around 91%, with the threshold 0.9. I change the number in gen_12net_data.py for more pos/part/neg samples, but is not helping. Question, how to improve pnet recall. for my problem. cat heads share much larger variance in both shape and texture. I am thinking about a larger deeper net work. any discussion is welcome!