chenhsuanlin / bundle-adjusting-NeRF

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)
MIT License
793 stars 114 forks source link

How to train for the case of a real life object and blender-like camera setting #19

Closed dedoogong closed 2 years ago

dedoogong commented 2 years ago

hello thanks for the sharing~! im trying to train barf with my custom dataset. I dont know the camera intrinsics and the camera is located at 3 position(0, 30, 60 degree) spherically and an object is on the turn-table and took pics of it every 15 degrees. so i have 72 pics ((pitch 0 yaw 15, 30, 45,...,360) and (pitch 30 yaw 15, 30, 45,...,360) and (pitch 60 yaw 15, 30, 45,...,360)). so, even the camera is fixed per each pitch, its similar as blender style camera movement. Barf support blender and llff but i failed to provide applicable camera pose information(.json or npz) as blender or llff dataloader.py need. so i tried to use iphone configuration.

i seems to train at a level. but is quite far from succeed after 200000steps.

i tried to initialize the camera pose spherically manually but it gives worse results.

please give me some hints to solve it.

thanks^^.

chenhsuanlin commented 2 years ago

Hi @dedoogong, from your description I'm guessing there could be several issues:

  1. The most critical is probably the pose initialization. BARF is still a local registration method, which means that the pose initializations have to be close enough to the underlying ground-truth poses. Since your multi-view data is object-centric and captured 360Ëš spherically, I don't expect BARF to be able to make the cameras automagically "wrap" around the object from the same (all-identity) pose. Since you already know your capture configuration of the 72 viewpoints, it would be more realistic to initialize from the spherical angles you described. I would suggest using such poses to train a NeRF first to make sure it could at least get you some reasonable results, and then switch to BARF to see if it improves.
  2. It sounds like you're turning the table and capturing the object every 15Ëš. If it's true, then the background would not be in correspondence, and BARF would have a hard time using the photometric cues for pose optimization. This wouldn't work even for the original NeRF (i.e. even when ground-truth poses are given).
  3. Using camera intrinsics different from your actual sensor could have an impact, but it probably isn't the major issue. You could consider also optimizing the intrinsic parameters (e.g. focal lengths), as in NeRF-- or Self-calibrating NeRF.

Typically if you don't see signs of BARF converging in 20k steps, then it probably won't in the end either. Hope these help!

dedoogong commented 2 years ago

Hi @chenhsuanlin ! thanks so much for your kind thoughtful reply! I tried to use an estimated camera intrinsics(focal length) from colmap but still failed. Maybe, as you pointed out, it's too hard for Barf to optimize the identical, init pose to all around, spherical posese from scratch. I agree your opinions(1,3) and I will try to find a better initial pose manually even though is would require a lot of trial and error. Thank you!