[Paper] Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

deepfakes / faceswap-model

Tweaking the generative model

147 stars 133 forks source link

[Paper] Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network #24

Open shaoanlu opened 6 years ago

shaoanlu commented 6 years ago

Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network Hai X. Pham, Yuting Wang, Vladimir Pavlovic

https://arxiv.org/pdf/1803.07716.pdf

References: ...

Deepfakes. https://github.com/deepfakes/faceswap

Wonder if this is the first time deepfakes got cited in an academic paper.

Edit: Just skimmed through this paper. It proposed a conditional GAN model w/ an auxiliary classifier "that is capable of synthesizing novel faces from any source portrait given a vector of action unit coefﬁcients". The "vector of facial action unit (AU) intensities" (don't know what exactly AU is) is treated as the conditional vector, which is concatenated to the embedding of generator. In addition, a pre-trained AU estimator is introduced for AU loss (this loss term is basically the same as perceptual loss).

h1vem1nd85 commented 6 years ago

The AU coefs are essentially face pose weights used to generate a facial expression. Through their method, a single source image 'xsrc' can be "posed" using a series of 'ytgt' images. The drawback (as outlined in the paper) is there is no ground truth for 'xtgt' in terms of actual image data and they are relying on the discriminator to determine if the face created in 'xtgt' is posed correctly.

The biggest issue I see implementing a two-input model into our workflow is the dependency created between face A and face B. This would lead to differing results based on which face A and which face B are fed through the model at a given time. For our purposes, each frame in the destination video would have to be paired with a similarly oriented face in our swapped images to minimize error rates. I am also unsure of how the 'xtgt' image would merge into the destination frame.

iperov commented 6 years ago

@shaoanlu I cannot understand ID class. What ID class for faces?

iperov commented 6 years ago

trying to implement this result without spatial smoothness loss python_2018-05-21_18-23-13

iperov commented 6 years ago

looks like my implementation not working :)

iperov commented 6 years ago

@shaoanlu I using vgg16 instead of vgg9 so actual vgg code about 2622 instead of 48