ESanchezLozano / GANnotation

GANnotation (PyTorch): Landmark-guided face to face synthesis using GANs (And a triple consistency loss!)
Other
194 stars 30 forks source link

No ID-preservation? #7

Open voa18105 opened 5 years ago

voa18105 commented 5 years ago

After struggling for a while, I wanna show my results...

What I've done: I took your iCCR and applied it to a video of my face, cropped it and landmarks by 128*128 with image, points = utils.crop( image , detected_points , size=128, tight=16 ) Performed it with several images from the video, and tried to give a photo + set of landmarks from a different frame.

Sadly, I got a result that does not look like me, even a bit :-(

Check it out sample.zip

Do you have any ideas?

ESanchezLozano commented 5 years ago

Hi,

Thanks for your message. After inspecting the image and after having shown it to people around in the lab, everyone agreed that both the generated image and the input image look alike (leaving aside the glasses and the beard). I would like you to try a similar approach with someone you don't know. Not even a famous person. Or, you can have a look at the video I attach with the paper. In the first two rows, the perceived IDs can be said to be similar to those of the images on the right. One would agree with the ID preservation in the majority of cases (at least from all the people I have shown the work to).

However, I also tried myself with this network, as well as with people I know (including famous people), and despite the results being realistic, I cannot tell the person is the same as that of the input image. I mean, I cannot see myself in the generated images. However I would believe that from a completely arbitrary position you could tell that the generated image that you have sent is good.

So, why is this occurring? Indeed this is a pretty nice line of research, which is what are the patterns that allow us to identify someone. In other words, what is the difference between the real identity and the perceived identity. The facial features (shape, texture, even gaze) would allow a network to validate whether to images belong to the same person or not. However, this doesn't apply to the way humans recognise a face.

In this work, the target is to transfer the facial features to a set of target points, making these to be the ground-truth points in the generated image. This means that this is more important in this work than ID preserving. Other amazing works have been proposed for face mimicry in which identity is better preserved, as only the expressions are meant to be transferred (e.g. the face2face work).

Hope this makes sense and of course I am open to debate on this if other opinions/critics arise :)

voa18105 commented 5 years ago

@ESanchezLozano thank you for a detailed response. Yes, of course, I see your point, as well as I see that the generated image look somehow like me. But looking like is far not same to identity preservation :-( I believe there is still some way to pass.

ESanchezLozano commented 5 years ago

Well this is a line of research in progress and as such is subject (and welcome) to improvements and suggestions/contribution :-)

ak9250 commented 5 years ago

i also had the same problem the input and the output does not preserve identity, the target actor looks alot younger than what I used

ESanchezLozano commented 5 years ago

If you have an example of it for me to look at maybe I could have an idea of why this is happening.

Thanks

ak9250 commented 4 years ago

@ESanchezLozano this is input and output obama ezgif com-video-to-gif (2)

ESanchezLozano commented 4 years ago

I reckon you should crop the image more according to what is expected. However it has been acknowledged that identity preservation is still an ongoing research matter so it could happen that yet this image won't give good results.

Rcity commented 4 years ago

@ESanchezLozano Hi, author, I also have the same question that the generative image looks dislike as the input image, even I used the cropped image. From the paper, the cropped input image is obtained by the ground-truth landmarks. What I want asking for you is if you used the face detector when the dataset without ground-truth landmarks? Thank you for your answer.

input: Wei_Wu_0001 output: test

ESanchezLozano commented 4 years ago

@Rcity Hi, the training code is available in this repo, in case you want to have a look. While I reckon there's a generalisation problem with the model (given to the data), there are some artifacts in your generated video that look weird to me. First because this appearance looks like the image I uploaded as an example more than to the image you uploaded. Second because the colour doesn't look right to me, and despite that images might not look like the input image, the colour is generally right. Please, double check.

Rcity commented 4 years ago

@ESanchezLozano I'm very pleasure to receive your reply. I forgot to convert the BGR channel to an RGB channel, so the color doesn't look right. Unfortunately, even if the color looks correct, the output images also don't look like the respective input images. And I have tested multiple images. input1 Wei_Wu_0001 output1 Wei_Wu_0001 input2 byy2 output2 byy2 input3 test_2 output3 test_2

ESanchezLozano commented 4 years ago

Could you please try a loosen crop?

I don't expect it to work perfectly fine for these images but I would expect a better performance. In any case the generalization problem can now be tackled with the training scripts, which are available, and the use of more extensive data :)