Closed dougsouza closed 7 years ago
The MTCNN is very good at detecting profile faces which is very nice. But it's not clear to me how to apply a 2D transformation that does not cause severe distortions to e.g. profile faces (where estimated positions for the eyes will be in the same position). I know that for example DeepFace uses 3D alignment which seems to work pretty well, but I guess it becomes algorithmically more tricky. So far my approach has been to just use the face bounding box and let the model generalize over different face poses.
I understand. Well, in my opinion a 2D alignment wouldn't distort the image, as we just need to rotate in a way that the face stays in vertical position (just think of cases where the neck is bent to one side and the face is not completely in vertical position). The reason is that the convolution is invariant to translation but not rotation, so I think it would improve in some way. I am going to perform some experiments with 2D alignment, I also have code for 3D alignment that I will try as well.
@davidsandberg, I tried the 2D alignment as I mentioned above, then I got this:
Runnning forward pass on LFW images Accuracy: 0.985+-0.006 Validation rate: 0.90600+-0.02119 @ FAR=0.00069
Any thoughts? It is a very slightly improvement I guess
@dougsouza . I tried to run this FaceNet , but I get this error : I get error no module named facenet. how run this demo? thx
Is it reasonable to replace the alignment step with the Spatial Transformation Networks? In this way alignment and feature extraction can be trained together.
It's also possible to use the STN to select "cropped patches" instead of manually designed in "Native-Deep Face Recognition: Touching the Limist of LFW Benchmark or Not?"
One way can be the piece-wise affine warping used in the Active Appearance Model. Each triangle in the face mesh has a corresponding triangle in the frontal mesh, which it can be warped into. The affine transformation for each triangle can be calculated using the coordinates of its vertices returned by the landmark detection and the corresponding landmarks in the frontal face template (imgDim * MINMAX_TEMPLATE) in the align_dlib.py.
Hi @dougsouza , thanks for pointing out this. May I know how you perform 2D alignment?
@MrXu I just use the landmarks of the eyes to calculate the angle the image needs to be rotated so the eyes become horizontally aligned, then I rotate. The only issue is that we need to detect the face twice, one before rotation, and after we rotate the bouding box is no longer good, so we need to detect the face again. It is not very practical for real time applications I guess.
@dougsouza hi, after 2D alignment, you get only a slightly improvement? i think it should improve much. what is the dataset you use for training?
@tengshaofeng ,
I didn't train my own model with aligned faces, I just aligned LFW and evaluated on the model trained in this repo. If we train the model with aligned faces it may improve accuracy, but I don't know, we would have to test.
@dougsouza, ok, Maybe I will train with aligned faces latter for testing. Thanks for your reply.
@ugtony Hey Tony did you try putting spatial transforms in the network?
@kevinlu1211 , Yes, I did. But I found the spatial transformer layers are difficult to train unless I freeze the parameters of other backend layers. When I freeze the backend layers, the spatial transformer layers tend to shrink and the image, in other words, the transformer image is smaller and the surrounding region is filled with blank color, which is different to what's shown in the original paper. I don't know why. I can see it makes the frontal faces vertical, but the accuracy isn't improved because of it.
There is a 2017 paper did the same thing. And the author's claim the face recognition accuracy is improved by introducing the layes. But I cannot reproduce it. They tested it with a weaker classifier and the improvement wasn't so significant. So I stopped trying.
@ugtony Could you link me to the paper? And did you implement your own spatial transformer or did you use the tf.contrib.layers
? Also how did you set up your normalization module? Also you could try finding the face landmarks using other methods, step 2 in this tutorial has links to the source code
@kevinlu1211 The paper is Towards End-to-End Face Recognition through Alignment Learning. I will read your link later, thanks.
Closing for now. Reopen if needed.
@dougsouza Excuse, how do you do landmark detection, dlib or mtcnn-tensorflow?
@JerryHouuu I used the mtcnn implementation from this repo.
@jerryhouuu this might help https://github.com/kevinlu1211/FacialClusteringPipeline
@dougsouza @kevinlu1211 thank u guys, i use mtcnn from this repo to do face alignment like @dougsouza do that ! its work
i make alignment with lfw images by dlib, but got 98% accuracy, why? i didn't retraining. anyone retraining new model with aligned images?
I am confused by this problem as well. @dougsouza MTCNN detects faces in images and crops faces with margin all the code do not align faces. And I have two problem?
Is there a training script to retrain the MTCNN landmarks for profile faces on the i-Bug menpo datasets?
@davidsandberg, I've checked your code for mtcnn face alignment and saw that there is not really an "alignment" going on, it just a crop with a margin around a bounding box, am I correct? It sounds weird to me why in this field they call the landmark detection as "alignment". Anyway, do you think that performing a simple 2D alignment (rotation) using the landmarks from the mtcnn would improve the results?
Cheers,
Doug