YudongGuo / AD-NeRF

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".
MIT License
1.03k stars 173 forks source link

About applying AudioAttNet only after 300k steps #14

Closed jerry0317 closed 3 years ago

jerry0317 commented 3 years ago

Hi Yudong,

Thanks for releasing your code for the audio-driven NeRF! I love the idea of using NeRF to solve this problem, and your implementation is pretty decent and neat.

I have a question in your implementation. I observed that the AudioAttNet is not applied in the first 300k steps (or more specifically, the values given by args.nosmo_iters, default to be 300k), and only applied after the 300k steps. That basically means it will not be used in the HeadNeRF training (which default to have 300k steps at max) and only in TorsoNeRF combined training. May I kindly ask the reason behind this way of implementation? And/or is there a reason to choose 300k as a specific point to apply AudioAttNet?

Thanks in advance!

YudongGuo commented 3 years ago

Hi Jerry,

Thanks for your kind words. It's my mistake that set N_iters not greater than nosmo_iters in HeadNeRF. Thanks for pointing out this. Also note that in this implementation the AudioNet and AudioAttNet are only trained in HeadNeRF, and they are fixed during TorsoNeRF training.

jerry0317 commented 3 years ago

Hi Yudong,

Thanks for your quick reply. May I ask is there a particular reason that AudioAttNet is trained only after 300k (or args.nosmo_iters) steps?

YudongGuo commented 3 years ago

Hi Jerry,

There is no particular reason for this strategy. We just want to make the network first learn to predict reasonable neural radiance fields for each frame and then learn to smooth the prediction. I didn't try to train the AudioAttNet at the beginning, while I think they will produce similar results.

jerry0317 commented 3 years ago

Hi Yudong,

Got it - it's good to know that, thanks!