facebookresearch / localrf

An algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video.
MIT License
974 stars 60 forks source link

Questions about camera poses #53

Closed curryandklay closed 3 months ago

curryandklay commented 7 months ago

Thank you for your excellent work! I test my own dataset with your project and got optimized poses, I visualized the camera poses in the EVO and found that the length of this poses' trajectory was clearly scaled, so I have a few questions:

  1. In the real case, the camera should have moved close to 300m, how do I restore the localrf generated pose to correspond to the real case (as in visual SLAM). image

  2. I still have some confusion about the coordinate system of the camera pose, I visualized the pose in transform.json in evo and found that its trajectory orientation is not consistent with the trajectory orientation obtained from visual SLAM, is the pose in transform.json generated by localrf in opencv format or in NeRF format? I would like to compare the transform.json with the pose obtained by the SLAM algorithm. image

Any reply from you will be appreciated!

ameuleman commented 7 months ago

Hi

  1. Poses are up-to-scale (which is typical for monocular RGB pose estimation)
  2. It should be in NeRF format. To align poses that have different orientations and scales, you could use Procrustes analysis.
curryandklay commented 7 months ago

Response faster than light! Thank you so much, I will check out _procrustanalysis for that! Sincerely appreciate this project, excellent work!

curryandklay commented 7 months ago

Thank you for your previous reply, I have found a way to restore the scale, but now I have also encountered some new conditions: 1, I tested with a custom dataset and found that the rendered image reflects a more pronounced blurring close to the camera, but not in areas away from the camera.

gt2

color-rectified-left-046881

2, My dataset image was acquired by a rover in its forward trajectory, due to the sun angle factor the shadow of the rover was included in the image range, so the RAFT estimated optical flow was incorrect in the shadow and its surrounding area, which directly led to artifacts and blurring around the shadow in the final rendered image. The video composited in smooth_spline also has a depth step in that area. 图片2

Sincerely, do you have any suggestions regarding the above situation? Any reply will be appreciated!

ameuleman commented 7 months ago

Thank you for sharing,

  1. We might be able to obtain sharper results simply by tweaking parameters. I would suggest increasing the camera translation learning rate e.g. adding --lr_t_init 1e-3 (or more). You could also try with larger --N_voxel_final, but this will slow down the optimization and require more memory.
  2. Handling transient shadows might be less straightforward. There are methods such as RobustNeRF that work on this. One could also attempt to mask out the shadow before training. The method should load masks in masks/ automatically. These masks should have the same names as the input images (with .png extension) or be named masks/all.png if the mask remains the same for all images.
curryandklay commented 7 months ago

Thank you for another quick reply. I will refer to your suggestion for more detailed testing. Looking forward to better results!

curryandklay commented 7 months ago

Bdw, I have another query that I forgot to mention: the depth_cmp from the tensorboard panel is something like the following, where there is a distinctly anomalous spiky cone-shaped depth estimate in front of the shaded area in the depth map, is this also an outlier in the model estimate due to transient shadows?

depth_cmp3-2

color-rectified-left-047071

ameuleman commented 6 months ago

I think so

curryandklay commented 6 months ago

Thank you so much! The mask has had a good effect, but there is still very strong blurring in the near-camera area after adjusting the parameters, I'm trying to figure out further.

curryandklay commented 6 months ago

Hi,ameuleman!

I found two TRICKS in your project feel some interest: 1, I found that he normalized depth standard deviation is [0,5] in the visualize_depth function, I may understand that it is a hyperparameter, but I would like to know how you selected this hyperparameter: was it chosen empirically or for a specific dataset? 2, You defined MLPRender_Fea_late_view in tensorBase, would you please to explain which scenario is suitable for MLPRender_Fea_late_view and MLPRender_PE respectively? Because I found that I get better PSNR with MLPRender_PE in some scenarios.

Best regard.

ameuleman commented 5 months ago

Hi,

  1. This is only used for visualization. Use what you believe is clearer.
  2. MLPRender_Fea_late_view can avoid overfitting the view-dependent component to the input views and help extrapolation when we don't capture from varied directions.
curryandklay commented 5 months ago

I got it. Thank you so much!