alextrevithick / GRF

🔥 General Radiance Field (ICCV, 2021)
280 stars 18 forks source link

A question about the section 3.5 #7

Closed sstzal closed 3 years ago

sstzal commented 3 years ago

In the figure 6 in section 3.5, the input of the MLP is 3D point feature and viewpoint (x,y,z) (correspond to the 3-D posotion in the classical NeRF). I wonder whether the 2-D direction is required is needed for the input of the MLP?

sstzal commented 3 years ago

The 3d point features seem not contain the direction information.

alextrevithick commented 3 years ago

Thanks for your attention to the details of the paper!

In section 3.2, we note that we stack (duplicate) the corresponding viewpoint, i.e., the xyz location of the camera, to each pixel of the image. Thus, the extracted unet features are aware of the camera locations of each of the input views before being aggregated with attention.

Does this answer your question?

sstzal commented 3 years ago

Thanks for your attention to the details of the paper!

In section 3.2, we note that we stack (duplicate) the corresponding viewpoint, i.e., the xyz location of the camera, to each pixel of the image. Thus, the extracted unet features are aware of the camera locations of each of the input views before being aggregated with attention.

Does this answer your question?

Thanks for your quick reply. I have anothe rquestion. So where is the camera rotation used?

alextrevithick commented 3 years ago

I think I misunderstood your initial question. We pass the view direction as in the original NeRF along with the extracted point features to ensure view-dependent effects in the more complex datasets are taken into account. We find this still performs better despite the 3D features being aware of the camera viewpoint. In addition, you can optionally convert the rotation to quaternion and add to the unet as with the camera centers.

sstzal commented 3 years ago

Thanks for your kind response.