NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more
https://nvlabs.github.io/instant-ngp
Other
15.81k stars 1.9k forks source link

Record 3D Scene not converging #1080

Open keli95566 opened 1 year ago

keli95566 commented 1 year ago

Hi there! I noticed that now we could read the camera poses from record 3D and skip running COLMAP. However, after following the new data preparation tip, I tried a few record 3D scenes, but they were not converging to a sensible scene like those running with COLMAP. (I also tried rotating the images )

Any further tips on how we could run record 3D scenes correctly? image

yenchenlin commented 1 year ago

Hi, it is true that Record3D and the underlying ARKit may fail to get the correct camera poses under circumstances (e.g., reflective objects). Have you visualized the camera poses of these scenes and do they make sense?

keli95566 commented 1 year ago

Thank you very much for getting back to me! I reduced the volume size to 2 and was able to find some parts of the recorded scene in the volume. Besides the issue with specular surfaces, It seems that the center of the volume box is placed at the first camera pose, rather than estimating the center of the recorded scene. I will try to record with the first camera pose facing the object of interest, and see if I get the same results. :) image

yenchenlin commented 1 year ago

For the camera pose issue, can you try to circulate around the object/scene of interest so that all your cameras are facing toward the object/scene? The script currently sets the intersection of all images' center rays to be the origin.

keli95566 commented 1 year ago

The camera pose issue is solved by your method. Thank you for the help!

@yenchenlin

I also re-calculated the camera poses with COLMAP and did a side-by-side comparison of the two rendering results. Render results with poses from COLMAP has higher quality than render results with ARkit pose estimation. I have not tested textureless scenes yet, but you are right, it seems that for some textured scenes, there is a trade-off to make here.

colmap_vs_record3d

I wonder if using a camera sensor with its' own inside-out tracking will yield better pose estimation than ARkit?

yenchenlin commented 1 year ago

Thanks for trying it out! This is aligned with my observation but actually, we can get "the best of both worlds" here by initializing COLMAP with ARKit/Record3D's camera poses. From my experience, this can prevent COLMAP from completely failing while improving the qualities of ARKit's poses.

I have a script for that and would love to try it on this scene. Do you mind sharing the data of this capture?

keli95566 commented 1 year ago

That is a really good idea! Here is the link to the data: https://drive.google.com/drive/folders/1GcFt4-bmpi-zHC5VIkNODfqtnY-bS_6E?usp=sharing Would you mind making a PR if it works? I am very curious to see how long it takes to go from the coarse pose to refine pose with COLMAP. Thank you very much!

jc211 commented 1 year ago

If you're using an iOS device to capture this dataset, bear in mind that the intrinsics change with every frame due to the optical stabilization. The transforms.json allows you to overwrite the intrinsics on a per frame basis. Using the depth from the lidar makes a huge difference to the reconstruction.

yenchenlin commented 1 year ago

@keli95566 Sorry I haven't had time to put together a PR. It's still on my to-do list and hope this doesn't block your progress. @jc211 Do you find using different intrinsics for each image help?

lexvandersluijs commented 1 year ago

Hi @yenchenlin I also have a couple of captures where it's clear that there are 'ghosts' in the NeRF due to drift in the VIO pose estimation. I was thinking about developing something that would combine the quality of COLMAP with the metric scale and uprightness of the record3d poses. I have actually developed something like this before, but it would not be a trivial task to port this over to these other data formats and coordinate system conventions. So if you are willing to share your script, that would be fantastic. Aside from that, let me know if you are interested in my (outdoor) datasets that exhibit the phenomenon, happy to share for testing purposes.

Spark001 commented 1 year ago
Hi @yenchenlin I have an observation that I got better result when I removed the normalize_transforms procedure. My scene is like an inward surrounding scene, the camera position differs a lot when without norm or with norm. The camera pose was shown with the unit box as follow : without normalization with normalization
image image

After training ~2w steps with extrinsics optimization, the without norm result was better than with norm, especially on the 'ghost' issue.

image image
image image

@yenchenlin What did you think about when designing this normalize_transforms part ?

@jc211 I tried use different intrinsics for each image, but I got error about loading data. Did you make any change about the loading code?

manasi9610 commented 1 year ago

Hi @yenchenlin , would you mind sharing the script initialising COLMAP with ARKit/Record3D's camera poses here? Thanks!

yenchenlin commented 1 year ago

Hi all, please run the following steps:

  1. Download the following notebook and dataset:
  1. Unzip the dataset

    unzip ba_example.zip
  2. Run the notebook bundle_adjustment_colmap.ipynb when your folder looks like the follow: ├── bundle_adjustment_colmap.ipynb └── ba_example

Then you should be able to run instant-ngp by treating ba_example as a dataset!