D-X-Y / landmark-detection

Four landmark detection algorithms, implemented in PyTorch.
https://xuanyidong.com/assets/projects/TPAMI-2020-SRT.html
MIT License
925 stars 180 forks source link

Regarding the face bounding box #57

Closed miracleyoo closed 4 years ago

miracleyoo commented 4 years ago

Hello, I'm a Ph.D. student working on a gamer related CV research. I'm trying to use SAN and integrate it into my project. But I find it doesn't work well with the bounding box generated by RetinaFace, a new paper published in 2019 and ranked 1st in Wider Face Datasets. The bounding box generated by it is not square, but rectangles. So I'm wondering whether SAN can only work well with the face box in some certain datasets like WFLW? Can it work with face bounding box generated by other models?

If you are convenient, I hope you can reply ASAP. I will definitely cite your paper if the project finished with SAN. Thanks!

D-X-Y commented 4 years ago

Thanks for this good question. Did you re-train the SAN on your dataset based on new bounding box? If not, SAN is highly possible to perform poorly. If you are using a pre-trained SAN, you should use the same face detectors, otherwise, SAN will perform poorly due to the mismatch of the different training bounding box and evaluation bounding box.

miracleyoo commented 4 years ago

I found the problem is that when I'm using the rectangle bounding box, SAN works even worse than the baseline dlib landmark, as the following graph shows: image The former is RetinaFace+SAN, the latter is dlib pack.

D-X-Y commented 4 years ago

@miracleyoo I see, are you using pre-trained SAN?

miracleyoo commented 4 years ago

Thanks a lot for the quick response! Yes, I'm using the pre-trained SAN. I believe that should be the problem. I will try to retrain the SAN network based on my own dataset and RetinaFace. But another question is that I need to manually make 68-points datasets based on my videos, or just use retina face to generate and replace the original face bounding box?

D-X-Y commented 4 years ago

It would be better to manually annotate 68 points based on your video. But I think using retina face to replace the original face bounding box is also fine.

miracleyoo commented 4 years ago

Thanks a lot for your precious help! I will try the second method at first.