Align rotation of input faces for GAN conversions

oatssss commented 6 years ago

Currently, the extractor finds a rotation matrix for each face using umeyama so it can generate a faceset with all the faces mostly upright. Unfortunately this rotation matrix isn't stored in the alignments file, only the bbox (of the un-rotated face) and facial alignments. For the GAN model, when it comes time to convert, the faces aren't rotated upright before being fed through the model so I doubt anyone has been able to get good results for faces that aren't completely upright.

I propose we store the rotation matrix in the alignments file during extract, then at conversion, re-apply it to the cropped face to make it upright before feeding through the model. The swapped output face then needs to be rotated in the inverse direction to match it with the frame again. Hopefully this is possible.

Jack29913 commented 6 years ago

Definitely worth working on it. Have you checked shaonlu's merge? Maybe there are approaches that we can benefit

oatssss commented 6 years ago

I realized that Convert_Masked.py already does this, however, it just recalculates the rotation matrix with umeyama on the fly instead of serializing/deserializing in an alignments file.

Any thoughts on which strategy to prefer? Re-calculating the transform on the fly will be more computationally demanding, but it may be negligible in comparison to the rest of the conversion process. Having the rotations in the alignments file might also be useful in the future.

babilio commented 6 years ago

I think serializing it would help implement things like #221 more easily, right? But either option would be great right now because I don't find GAN very useful at the moment because of this problem.

Thank you for all the work you have been doing!

ppmdo commented 6 years ago

@oatssss, Yes I've had bad results with rotated faces. I think this deserves attention.

When I read #197, I thought that if the Extract.py script didn't find a face (odd angles), the script could rotate the images (90°) rewrite them to disk, re-try and then store the rotation info in rotation.json or so. Bruteforce, basically.

Then, when converting, the process would have to invert the rotation and return the frame to its original orientation. However, I don't know if this would also be computationally expensive.

oatssss commented 6 years ago

I enabled the Masked converter for GAN models which should fix this in https://github.com/deepfakes/faceswap/pull/217/commits/236112afbc6bb65b0b43e25be37e3308c4f2fd35

@ppmdo can you see if you get better results with that commit? Remember to switch from gan to the masked converter.

torzdf commented 6 years ago

@ppmdo

the script could rotate the images (90°) rewrite them to disk, re-try and then store the rotation info in rotation.json or so. Bruteforce, basically.

Then, when converting, the process would have to invert the rotation and return the frame to its original orientation. However, I don't know if this would also be computationally expensive.

I currently have a batch script which does exactly this, and I can say that it significantly slows down the process (mainly due to disk reads and writes and having to rescan frames with no faces multiple times). On the plus side, I see a huge improvement on the number of faces picked up. I am currently looking at a way to implement something similar within the code, to avoid the disk IO issue, but I need to try to push some usability fixes first, which will help my approach (hoping they get picked up).

Ideally we would only have to rotate the image once, to detect the face, then we could rotate the bounding rectangle, and associated landmarks by the appropriate amount when applying the convert, but I am drawing a blank on how to do this with dlib (my python is average, my dlib is non-existant). If anyone has any pointers, please let me know, otherwise I have other ideas, but there will be a performance hit, so I would make it optional.

ppmdo commented 6 years ago

@torzdf If we rotate the images in-memory (or use file-like memory storage) we could avoid the IO overhead, and only write to disk when the extractor finds a face. There will definitively be a performance git anyway, but memory is much more efficient. There's the PIL library which can manipulate images in various ways. Although, using PIL, I believe, will add a dependency for the codebase.

Also, It's my understanding that the scripts feed the image through OpenCV which is capable of doing geometrical transformations on read images.

torzdf commented 6 years ago

Oh, I can definitely do it in python using memory only, but I want to see if we can rotate the image, get the landmarks at extract and then transform the landmarks for applying at the convert stage, as this will definitely be quicker for the convert at least. If I don't get anywhere though, I will just look at rotating again at convert, but I don't want to go down that avenue until I have explored the first option.

babilio commented 6 years ago

Another possibility is maybe using this #214 so after a full pass through the directory it would allow to rotate and go through it again, skipping the ones that were already found. Maybe is less efficient vs doing it when the img is already in memory

oatssss commented 6 years ago

I'm closing this because for now it's solved in https://github.com/deepfakes/faceswap/pull/217

deepfakes / faceswap

Align rotation of input faces for GAN conversions #224