deepfakes / faceswap-model

Tweaking the generative model
147 stars 133 forks source link

Training on 1 face only #5

Open Clorr opened 6 years ago

Clorr commented 6 years ago

Hi guys, I'm wondering about the possibility of training the models on 1 face only instead of 2.

The actual workflow makes it mandatory to train a model for a specific src/target pair which is not convenient. If a training can be done with just 1 face, once a model is trained, it would be much more shareable, no?

What do you think?

shaoanlu commented 6 years ago

How about putting multiple person you want to transform into src folder? If the encoder learns good embedding then perhaps this will work.

Ganonmaster commented 6 years ago

I unfortunately do not know if the model format allows for that. I know very little about the inner workings of the format. (been focussing on usability first) Has anyone tried swapping one of the models with a third?

Clorr commented 6 years ago

@shaoanlu if you think that providing a "generic" src data set could help make generic models, it would already be a major impprovment...

@Ganonmaster my first tries with the code was with a random girl photos and was going from decent to totally crap, but I did not go further: group

JarbasAl commented 6 years ago

maybe we could train the encoder with facial landmarks instead of pictures, then train the decoder for generating the target face?

there is this pix2pix example https://github.com/karolmajek/face2face-demo , i don't know if this could be improved and modified for this use case

bryanlyon commented 6 years ago

You could train the model on 1 face. You would do this using the faceswap tool on the same person in both directories. It would be important to collect those images from entirely different sources as faceswaping to matching images would train the model with bad habits.

After developing the model in this way, you should be able to replace the 2nd face with less time. Unfortunately, this would probably be only slightly faster at the retraining due to having to "unlearn" the old data.

Honestly, the best "sharable" model would be one trained on the face you want to swap onto a target with a large number of faces that are similar to the face. These could be gathered using existing datasets like CelebA http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html which already sort out faces into categories like "oval face". This would train the model to replace a wide variety of faces with the face that you want. Then the retraining would take much less time once you find the target face you wish to replace.

The problem with doing this is that due to the number of different faces you're training against in the initial training it would take a significantly longer time (probably near an order of magnitude). This would make a "generic" model for the one face, making it more effective in the long run, but taking a lot longer initially. This would make sense for projects like the "Cage in every movie" project as they could spend a couple weeks building a generic "Nicholas Cage" model and then retrain it to any actor in less time. This would not work well for low use faces since the initial training would take longer than a specific training to match just two faces.

Clorr commented 6 years ago

Hi @bryanlyon and thanks for your feedback. I'm not sure to understand all of your answer as english is not my first language and maybe i missed things...

My idea behind this discussion was not necessarly talking about the model as it is now. I was just pointing here that it is a pain to have to retrain a model for each and every pair of src/target and I was kickstarting a discussion around that.

At least, I agree with you that training the actual model with just one face won't necessarly give good result as it will be too closely bound with the face it is trained with.

The point I have reached now is to train an encoder+decoder with many faces to have generic embeddings not related to person in particular. Then I will drop the decoder, and train a totally new decoder, plugged on the generic encoder, with the target face only. The CelebA is a good way to start imho, but it won't have as many face poses as needed for very generic conversion. But I'm just trying this for now.

Also shaoanlu has made some test with that in his repo (see 10.)

Clorr commented 6 years ago

@JarbasAl Shaoanlu has also tried face landmarks as source here (see 6.)