TropComplique / FaceBoxes-tensorflow

A fast face detector
MIT License
179 stars 66 forks source link

Is it applicable to add landmark detection in this network #3

Open tenggyut opened 6 years ago

tenggyut commented 6 years ago

According to the paper, Faceboxes seems a good replacement of MTCNN in Face Detection Area. But MTCNN has a built in landmark detection, I wonder it is applicable to change faceboxes into a joint multi task network just like MTCNN?

Also Any idea about filling the performance gap between this implementation and the caffe one?

Thanks

tirtile commented 6 years ago

I was try to combine this model with onet in MTCNN to detect face and landmarks, it works well.

tenggyut commented 6 years ago

How to combine onet with faceboxes?use faceboxes's prediction as onet's input?

tirtile commented 6 years ago

Yes. But, change it to a multi task network and retrain it may be better.

tenggyut commented 6 years ago

But the feature map generated by faceboxes is not reused, so may hurt the runtime efficiency?

Also, did you reproduce the performance described in the original paper?

tirtile commented 6 years ago

Yep. No, I haven't retrained yet.

TropComplique commented 6 years ago

Hi. It is a good idea to use onet with FaceBoxes to detect facial landmarks.

But you could also train a simple keypoint detector by yourself. Here is an example of training a simple and fast (~0.5 ms on GTX 1080) 5-keypoints detector: https://github.com/TropComplique/wing-loss (it is not completely finished yet). It is an implementation of this: https://arxiv.org/abs/1711.06753.

I believe that it will be hard to train FaceBoxes for keypoint prediction using multitask loss. Because we will need a lot of training data:
images with a lot of face bounding boxes + keypoints for each face.
But we only have data like this:
images with only one face and keypoints for it. For example, CelebA dataset. And this: images with a lot of face bounding boxes only. For example, WIDER dataset.

And I believe onet is trained on face crops only. I mean, it sees only close face regions during training.