facebookresearch / audio2photoreal

Code and dataset for photorealistic Codec Avatars driven from audio
Other
2.66k stars 250 forks source link

Training inference time and test data #61

Closed prinshul closed 4 months ago

prinshul commented 5 months ago

What GPUs and how many of them are used for training/inference?

What is the total training and inference time?

Thanks

prinshul commented 5 months ago

Also how exactly is testing/inference done? On the same four participants ? Unable to find test script in the repo.

evonneng commented 4 months ago

Hi! Sorry for the delay in response.

I used a single A100 for all such. And total train time for each component can be parallelized but in general, face model = 1 day, body vq model = 1 day, body diffusion model = 3 days. Of course, everything can be gpu parallelized for faster run time as well.

Inference time depends on the length of sequences. but for instance, to run all sequences in our test set (8 s) for 3 iterations took ~30 minutes - 1 hour.

Inference script can be found here: https://github.com/facebookresearch/audio2photoreal/blob/main/sample/generate.py