Closed jerry0317 closed 3 years ago
Hi Jerry,
Thanks for your kind words. It's my mistake that set N_iters not greater than nosmo_iters in HeadNeRF. Thanks for pointing out this. Also note that in this implementation the AudioNet and AudioAttNet are only trained in HeadNeRF, and they are fixed during TorsoNeRF training.
Hi Yudong,
Thanks for your quick reply. May I ask is there a particular reason that AudioAttNet
is trained only after 300k (or args.nosmo_iters
) steps?
Hi Jerry,
There is no particular reason for this strategy. We just want to make the network first learn to predict reasonable neural radiance fields for each frame and then learn to smooth the prediction. I didn't try to train the AudioAttNet at the beginning, while I think they will produce similar results.
Hi Yudong,
Got it - it's good to know that, thanks!
Hi Yudong,
Thanks for releasing your code for the audio-driven NeRF! I love the idea of using NeRF to solve this problem, and your implementation is pretty decent and neat.
I have a question in your implementation. I observed that the
AudioAttNet
is not applied in the first 300k steps (or more specifically, the values given byargs.nosmo_iters
, default to be 300k), and only applied after the 300k steps. That basically means it will not be used in the HeadNeRF training (which default to have 300k steps at max) and only in TorsoNeRF combined training. May I kindly ask the reason behind this way of implementation? And/or is there a reason to choose 300k as a specific point to applyAudioAttNet
?Thanks in advance!