How to generate a synthetic video by an audio and an image?

dc3ea9f / vico_challenge_baseline

http://vico-challenge.github.io/

86 stars 12 forks source link

How to generate a synthetic video by an audio and an image? #14

Closed NoHateAnymore closed 2 years ago

NoHateAnymore commented 2 years ago

Hi, thanks for your working. But I didn't see how to generate a synthetic video by an audio and an image, how to do it plz.

NoHateAnymore commented 2 years ago

or how to use test data?

dc3ea9f commented 2 years ago

You can use the inference code to get target 3DMMs and then use the PIRender to convert 3DMMs to video.

NoHateAnymore commented 2 years ago

You can use the inference code to get target 3DMMs and then use the PIRender to convert 3DMMs to video.

Thank you for response. I got 3DMMs by python eval.py \ --batch_size 4 \ --output_path saved/baseline_speaker_E500 \ --resume saved/baseline_speaker/checkpoints/Epoch_500.bin \ --task speaker

But in #prepare vox lmdb# step, I ran python scripts/prepare_vox_lmdb.py \ --path ../../data/listening_head/videos/ \ --coeff_3dmm_path ../vico/saved/baseline_speaker_E500/recon_coeffs/ \ --out ../vico/saved/baseline_speaker_E500/vox_lmdb/

Do I just need to change the path to image path?

dc3ea9f commented 2 years ago

you can generate a fake video by duplicating image to len(mfcc), and use that video to create vox_lmdb for visualization. For more details, please refer to PIRender.

NoHateAnymore commented 2 years ago

you can generate a fake video by duplicating image to len(mfcc), and use that video to create vox_lmdb for visualization. For more details, please refer to PIRender.

I got it. Thank you!!!!!!!!!!!!!!!!!!!!!!!!

yogeshchandrasekharuni commented 1 year ago

Hey @Minghy, assuming you already worked on this issue, can you please provide a notebook/script with inference? Thank you!

yogeshchandrasekharuni commented 1 year ago

you can generate a fake video by duplicating image to len(mfcc), and use that video to create vox_lmdb for visualization. For more details, please refer to PIRender.

Can you please elaborate what you mean by duplicating an input image to len(mfcc)? If I have a portrait image of the person that is to speak the utterance, what can I do?