gafniguy / 4D-Facial-Avatars

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction
679 stars 67 forks source link

Question about background #4

Closed xk-huang closed 3 years ago

xk-huang commented 3 years ago

Hi, Thanks for your amazing work! Recently I'm trying to re-implement NerFACE. However, I'm struggling with the shaking background problem right now. 😢

  1. As the paper mentioned: "The last sample on the ray r is assumed to lie on the background with a fixed color". Does it mean we only replace the RGB output of the last sample per ray while leaving along with the density output?
  2. In the original NeRF paper, points on the rays are sampled randomly within N evenly-spaced bins, so does NerFACE use the same point-sampling strategy or any others?
  3. In the dave-dvp dataset, there are multiple background images rather than only one static background as the paper states. And there are some compression errors when comparing any two background images, even between training images and backgrounds (also in the still background area!). I'm not sure whether the errors mentioned above will affect the optimization of radiance fields or not.

Hoping to get some suggestions! Thanks in advance!

gafniguy commented 3 years ago

Hi, thanks for your good questions.

  1. Precisely. When shooting a ray through a pixel, I take the RGB value of the background image at that pixel location, and over-write the predicted rgb of the furthermost point on the ray to have those values. the density remains the predicted output.

Basically in the end of _predict_and_renderradiance function I do:


    if background_prior is not None:
        radiance_field[:,-1,:3] = background_prior

and then in the _volume_render_radiancefield function I do:

    if background_prior is not None:
        rgb = torch.sigmoid(radiance_field[:, :-1, :3])
        rgb = torch.cat((rgb, radiance_field[:, -1, :3].unsqueeze(1)), dim=1)
    else:
        rgb = torch.sigmoid(radiance_field[..., :3])

Also, if you get dotted artifacts on the background, you can add:

sigma_a[:,-1] += 1e-6 after doing the ReLU on the (densities+noise). It helped for me.

  1. I use the same sampling strategy along the rays as nerf (coarse and fine networks, for the coarse one the samples are almost even and then importance sampling weights are calculated using those densities). The only difference is a small trick in the choice of pixels to shoot rays through. In NeRF you can't afford to shoot rays through all pixels every iteration so you randomly choose a subset of pixels. What I do is randomly sample pixels such that 90% of them are in the bounding box of the head (in 2d image space), and only 10% are in the background.

  2. Oh, I meant to put just one. The reason why there are many is that I recorded a video of the background and not a single image. I only used a single frame though. Specifically, I used frame #50 (arbitrary choice, I didn't want the first ones because I thought the camera moves a bit when I clicked the 'record' button. I didn't have a remote control).

PS I'm based on this implementation of nerf . If you use it check out the issues section for some easy bug fixes it has.

xk-huang commented 3 years ago

Thanks for your timely and informative reply! Tons of help!