evonneng / learning2listen

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)
106 stars 10 forks source link

Reconstructing generated output #3

Closed rohitkatlaa closed 1 year ago

rohitkatlaa commented 1 year ago

I would like to use the model of this research paper for my experiments but I am not sure how to reconstruct the output videos from the pkl files. Please guide me in the steps for reconstructing the videos. Thanks!!

evonneng commented 1 year ago

Thank you for your interest in our paper. Unfortunately, we will not be able to release code to do video reconstruction for privacy reasons. However, the high-level pix2pix reconstruction is described in the supplementary here: https://arxiv.org/pdf/2204.08451.pdf

rohitkatlaa commented 1 year ago

@evonneng Thank you! But is it possible to share the input generation code since this does not violate the privacy by generating new videos. It would be really helpful if you could provide it.

evonneng commented 1 year ago

Hi Rohit! I haven't planned to release the processing script, but might do so in the future post-cleanup. In the meantime, the process is mostly using a combination of out-of-the box models and scripts from existing repos.

  1. https://github.com/YadiraF/DECA - to process the facial features
  2. https://github.com/joonson/syncnet_python- to determine who is speaking (left person or right person)
  3. https://github.com/andrewowens/multisensory- to get the sound source separated after knowing who is the speaker from above

Hope that helps!