dc3ea9f / vico_challenge_baseline

http://vico-challenge.github.io/
86 stars 12 forks source link

Submission Format #21

Open yangdaowu opened 1 year ago

yangdaowu commented 1 year ago

Hello, I don’t know how to submit this step. How should I follow the speaker video, first frame and reference image generation test set video?

image

I use the following code, but only generates the video of the training set.

image

dc3ea9f commented 1 year ago

The issue #14 may help you do the inference with test set.

yangdaowu commented 1 year ago

I changed the path to the video path of the test set, but it still doesn't work

yangdaowu commented 1 year ago

image

dc3ea9f commented 1 year ago

https://github.com/dc3ea9f/vico_challenge_baseline/issues/14#issuecomment-1282043625 you can generate a fake video by duplicating image to len(mfcc), and use that video to create vox_lmdb for visualization. For more details, please refer to PIRender.

First, you should generate a fake video, then, make predictions and render it.

yangdaowu commented 1 year ago

Sorry, bother again, how to generate fake videos?

dc3ea9f commented 1 year ago

Our videos are 30fps, and there is a mapping between the audio length and its mfcc feature length, get the mfcc feature length first, then you can just duplicate the first frame to the length of mfcc to generate the fake video.

yangdaowu commented 1 year ago

If possible, could you provide corresponding steps? Thank you.