Closed sstzal closed 3 years ago
The 3d point features seem not contain the direction information.
Thanks for your attention to the details of the paper!
In section 3.2, we note that we stack (duplicate) the corresponding viewpoint, i.e., the xyz location of the camera, to each pixel of the image. Thus, the extracted unet features are aware of the camera locations of each of the input views before being aggregated with attention.
Does this answer your question?
Thanks for your attention to the details of the paper!
In section 3.2, we note that we stack (duplicate) the corresponding viewpoint, i.e., the xyz location of the camera, to each pixel of the image. Thus, the extracted unet features are aware of the camera locations of each of the input views before being aggregated with attention.
Does this answer your question?
Thanks for your quick reply. I have anothe rquestion. So where is the camera rotation used?
I think I misunderstood your initial question. We pass the view direction as in the original NeRF along with the extracted point features to ensure view-dependent effects in the more complex datasets are taken into account. We find this still performs better despite the 3D features being aware of the camera viewpoint. In addition, you can optionally convert the rotation to quaternion and add to the unet as with the camera centers.
Thanks for your kind response.
In the figure 6 in section 3.5, the input of the MLP is 3D point feature and viewpoint (x,y,z) (correspond to the 3-D posotion in the classical NeRF). I wonder whether the 2-D direction is required is needed for the input of the MLP?