Closed tsing90 closed 5 years ago
For my training video a fixed crop_size is good enough. But of course you can adjust the crop_size since the face GAN is fully convolutional, which supports varying input size. You can roughly estimate the head size for each frame through the distance from head to neck.
I just borrowed a simple image restoration network for face enhancement. It's quite straightforward. You can find any other image enhancement network suit your needs. C'mon, it's just a 48-by-48 patch, how hard can this be?
@Lotayou thanks for your reply, really reasonable. For hand enhancement, do you think will the same method work? Due to fast movement, the hands sometimes are blurred, which affects the training result. Do you have any idea about solving this problem? thanks
@tsing90 I haven't got time for hand enhancements yet, but I think it's gonna be harder than face enhancement. The reasons are threefold:
I guess it would be possible to enhance the hands of a certain person but I never tested it. Can you show me some of your results? Thanks
@Lotayou Thanks for your reply. I trained the video of myself, so I may prefer not to share it here. For hand enhancement, actually, I found the key problem is different from face which needs fine tuning. Even I got the keypoints of hands (20 for each hand), the keypoints are not accurate (sometimes are missing!), the model is not able to learn it in the right way during training. I'm thinking about using some tricks to make keypoints more meaningful.
I agree. Hands are very small objects and hand pose estimation cannot be very robust or accurate. By the way, can I just geek a peek on the hand enhancement results real quick? You don't need to expose your face:) Thx
Here I attached the photo of my result, and feel free to give comments if you would like to know more. [PS: I will delete this photo when you have investigated it :) ]
Thanks for your photo! Now I see where the real problem lies: For face enhancement when you get a blurry result, it's obviously a fake. However for hands it's kinda hard to make the same judgement since hands in the original video can be pretty messed up too. This is especially the case for training GANs since the authenticity criterion does not depend on per-frame quality anymore. I think maybe it's better to focus on enforcing the temporal consistency, maybe introducing some RNN or C3D modules. Also it's possible to use longer temporal segments, since hand regions are much smaller than the whole frame.
BTW, feel free to delete the picture anytime you want:)
thanks for your comments, recently I am going to try 3d poses instead of 2d for this task, which I believe more information can be learned.
Good luck! Keep me posted if you find anything interesting then.
Hi, glad to see your great work, really useful. Here I have two questions about face enhancement part:
when doing face crop, you set crop_size = 48 for 512-frame videos. But that may not be accurate as the person (or its head) can be either large or small due to the distance to the camera. Is there a better way to do the crop? thanks
Is there any official paper about face-gan? I didn't get any related reference from the 'everybody dance now' paper. And is there any other implementation about face-gan on github?
many thanks