gafniguy / 4D-Facial-Avatars

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction
679 stars 67 forks source link

Questions about the latent code #6

Closed wikiwen closed 2 years ago

wikiwen commented 3 years ago

Hello, gafniguy, thank you for your brilliant paper and wonderful work.

I'm a little curious about the latent code. The paper said the latent code can compensate error in the facial experssion and pose estimation and make the image more sharper. I have three questions:

  1. How to get the latent code? Sorry, I didn't find the description in the paper. Use a trained network, like lenet, resnet?
  2. Why the latent code can make the reconstruction more sharper? Is it because the code will bring some information like edges?
  3. Why fixed latent code is used for test set? Will it bring a problem if the first frame of traing set is very different from the test set?
GaryGky commented 2 years ago

After reading the trainning code, I think the latent code is just an artificial tensor which is initialized as torch.zeros() but with gradient. So that the network can learn something from the latent code. In my view, that's just a trick?

gafniguy commented 2 years ago

Sorry for the late response, missed this one.

  1. @GaryGky is correct. You just backprop from the loss function into a 'memory' vector that is assigned to each training frame.
  2. Because with this extra per-frame memory we give the network, it can overfit in training to fix errors/non-rigidities that are not explained by the expression vector (which is too low-dimensional).
  3. To choose one plausible prediction of the geometry and just go with it. Changing them in test time will add jitter. And yeah ideally you'd want a latent code of a somewhat neutral frame... Honestly I didn't play too much around with it.