ActiveVisionLab / nope-nerf

(CVPR 2023) NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
https://nope-nerf.active.vision/
MIT License
384 stars 30 forks source link

Adapting with KITTI-360 #18

Open js0n-lai opened 1 year ago

js0n-lai commented 1 year ago

Hi,

Thanks for your work so far! I am interested in adapting NoPe-NeRF to the KITTI-360 dataset and in particular using LiDAR data as an alternative method for depth supervision and establishing sparse correspondences in challenging scenes. My YAML file is currently as below, where the data consists of ~100 posed images from the left camera.

dataloading:
  customized_focal: true
  customized_poses: true
  load_colmap_poses: false
  path: data/2013_05_28_drive_0000_sync
  random_ref: 1
  resize_factor: 1
  scene:
  - still_5
depth:
  type: None
distortion:
  learn_distortion: true
extract_images:
  eval_depth: true
  resolution:
  - 376
  - 1408
pose:
  init_R_only: false
  init_pose: false
  learn_R: true
  learn_focal: false
  learn_pose: true
  learn_t: true
  update_focal: true
training:
  auto_scheduler: true
  match_method: dense
  out_dir: out/kitti360/2013_05_28_drive_0000_sync/still_5
  use_gt_depth: false
  with_ssim: false

I had some questions:

  1. When loading custom poses, what coordinate system is expected? The dataset uses (forward, left, up) for the camera-to-world transform. Furthermore, are these poses expected to be recentered and spherified as COLMAP poses are when loaded?
  2. When init_pose is set to true, an ======invalid mask===== message is printed throughout training, with the losses fluctuating significantly. What could be causing this? I suspect it could be due to either insufficient overlap in the training images (colmap gui and img2poses from LLFF both failed to generate poses for most images) or the poses I supplied being in the incorrect coordinate system.
  3. When init_pose is set to false, what initial pose is used (assuming load_colmap_poses is false)?
  4. How can pose refinement be disabled (e.g. to train using GT poses only)? I ran into errors setting learn_pose to false as pose_param_net gets set to None in train.py.
bianwenjing commented 1 year ago

Hello, thank you for your interest in our work. Below are my answers to your questions:

  1. The coordinate system we use is the OpenGL system. Before loading the ground truth poses, we recenter and spherify them.

  2. If the initial poses you provided are accurate, the issue might be caused by incorrect coordinate systems, incorrect depth scale, or sparse views. To gain more insight, you can check the reprojected images located under /rendering.

  3. When init_pose is set to False, the poses are initialised with identity matrices.

  4. To disable pose refinement, you should set cfg['pose']['learn_R'] and cfg['pose']['learn_t'] to False, but leave cfg['pose']['learn_pose'] as True (note that learn_pose is a redundant parameter). I've included an example of a config file for NeRF training with fixed poses.

If you have any further questions or need additional information, please feel free to ask.

js0n-lai commented 1 year ago

Thank you for the informative response. Could you further clarify the following:

  1. As it currently appears in dataloading/dataset.py, recentering and spherifying occurs if poses are loaded using load_colmap_poses via poses_bounds.npy, but not customized_poses via gt_poses.npz. Is that intentional?
  2. I suspect the cause of the invalid mask is incorrect depth scale, as the KITTI GT positions are much larger in magnitude than the poses used in the Tanks and Temples dataset (e.g. Ballroom). For example, scaling the GT translations down by a factor of 1000 eliminated the issue but the synthesised views are no better than not setting init_pose. What would be a more principled way to correct the depth scale?