face enhancement questions

Lotayou / everybody_dance_now_pytorch

A PyTorch Implementation of "Everybody Dance Now" from Berkeley AI lab.

GNU Affero General Public License v3.0

280 stars 72 forks source link

face enhancement questions #5

Closed tsing90 closed 5 years ago

tsing90 commented 5 years ago

Hi, glad to see your great work, really useful. Here I have two questions about face enhancement part:

when doing face crop, you set crop_size = 48 for 512-frame videos. But that may not be accurate as the person (or its head) can be either large or small due to the distance to the camera. Is there a better way to do the crop? thanks
Is there any official paper about face-gan? I didn't get any related reference from the 'everybody dance now' paper. And is there any other implementation about face-gan on github?

many thanks

Lotayou commented 5 years ago

For my training video a fixed crop_size is good enough. But of course you can adjust the crop_size since the face GAN is fully convolutional, which supports varying input size. You can roughly estimate the head size for each frame through the distance from head to neck.
I just borrowed a simple image restoration network for face enhancement. It's quite straightforward. You can find any other image enhancement network suit your needs. C'mon, it's just a 48-by-48 patch, how hard can this be?

tsing90 commented 5 years ago

@Lotayou thanks for your reply, really reasonable. For hand enhancement, do you think will the same method work? Due to fast movement, the hands sometimes are blurred, which affects the training result. Do you have any idea about solving this problem? thanks

Lotayou commented 5 years ago

@tsing90 I haven't got time for hand enhancements yet, but I think it's gonna be harder than face enhancement. The reasons are threefold:

Under most cases, people would pay more attention to facial details than hands.
Hands have more flexible movements and non-rigid deformations than heads, as human's fingers can cross and entwine in various ways.
In most cases hands are placed in front of the body, and enhancing the patch with a hand could very likely alter the background body texture as well, which could lead to blocking artifacts.

I guess it would be possible to enhance the hands of a certain person but I never tested it. Can you show me some of your results? Thanks

tsing90 commented 5 years ago

@Lotayou Thanks for your reply. I trained the video of myself, so I may prefer not to share it here. For hand enhancement, actually, I found the key problem is different from face which needs fine tuning. Even I got the keypoints of hands (20 for each hand), the keypoints are not accurate (sometimes are missing!), the model is not able to learn it in the right way during training. I'm thinking about using some tricks to make keypoints more meaningful.

Lotayou commented 5 years ago

I agree. Hands are very small objects and hand pose estimation cannot be very robust or accurate. By the way, can I just geek a peek on the hand enhancement results real quick? You don't need to expose your face:) Thx

tsing90 commented 5 years ago

Here I attached the photo of my result, and feel free to give comments if you would like to know more. [PS: I will delete this photo when you have investigated it :) ]

Lotayou commented 5 years ago

Thanks for your photo! Now I see where the real problem lies: For face enhancement when you get a blurry result, it's obviously a fake. However for hands it's kinda hard to make the same judgement since hands in the original video can be pretty messed up too. This is especially the case for training GANs since the authenticity criterion does not depend on per-frame quality anymore. I think maybe it's better to focus on enforcing the temporal consistency, maybe introducing some RNN or C3D modules. Also it's possible to use longer temporal segments, since hand regions are much smaller than the whole frame.

BTW, feel free to delete the picture anytime you want:)

tsing90 commented 5 years ago

thanks for your comments, recently I am going to try 3d poses instead of 2d for this task, which I believe more information can be learned.

Lotayou commented 5 years ago

Good luck! Keep me posted if you find anything interesting then.