OpenTalker / StyleHEAT

[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation
MIT License
627 stars 77 forks source link

Doubt on data/audio_dataset.py #30

Closed yqwang closed 1 year ago

yqwang commented 1 year ago

I guess the class of AudioDataset would used in Audio Driven Motion Generation. Proxy input image is used and make pairs of proxy input and target image according the paper, but there is no hint to find proxy input image in lmdb. So do the proxy input image generate in training? Does the image source_align found in AudioDataset used in Audio driven training, or what is this kind image's useful here? Look forward to your reply. https://github.com/FeiiYin/StyleHEAT/blob/bad7f124a74028ee4f425428388bb1e350a5119e/data/audio_dataset.py#L184

FeiiYin commented 1 year ago

source_align is used for video reenactment training. Since our framework is built based on StyleGAN and StyleGAN is trained on aligned image data, our network input has to be aligned with image data for inversion quality. The alignment process can be checked in ProGAN or other inversion methods. As for proxy input, we implement it in the forward part. We did not pre-process the image in the dataset part.

FeiiYin commented 1 year ago

The code for aligning the video face can be seen here: https://github.com/FeiiYin/StyleHEAT/blob/main/utils/video_preprocess/align_face.py