NVlabs / neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
https://research.nvidia.com/labs/dir/neuralangelo/
Other
4.33k stars 387 forks source link

Poor results on Barn #44

Open MulinYu opened 1 year ago

MulinYu commented 1 year ago

Dear authors,

Thanks for relaesing the code. I ran the code with the given tnt confige on Barn (processed folowing the guide) but the result is bad and far away from which shown in the paper. Can you please give me some advices or share the config that you used?

Thnaks alot, Best. Mulin

MulinYu commented 1 year ago

Here is the results: normal barn

mli0603 commented 1 year ago

Hi @MulinYu

Thank you for your interest in the project!

we recently found a bug in T&T preprocessing. It was fixed in https://github.com/NVlabs/neuralangelo/commit/62483febb00c9ad4b8a17884abf9191acc6cafa9. Can you check if this fixes your issue?

DerKleineLi commented 1 year ago

Hi @mli0603

I don't think the fix works, as bounding_box only affects the aabb_range in transforms.json, which is not used during training.

With the latest commit I tried to train the Meetingroom and got similar result as above: 未标题-1 from left to right: gt, render, normal, depth. This is the evaluation reslut at iteration 80k

I followed #14 to setup the hyperparameters using grad_accum_iter, which should match the experiment in the paper:

_parent_: projects/neuralangelo/configs/base.yaml

max_iter: 4000000

wandb_scalar_iter: 800
wandb_image_iter: 80000
validation_iter: 40000

model:
    object:
        sdf:
            mlp:
                inside_out: True   # True for Meetingroom.
            encoding:
                coarse2fine:
                    init_active_level: 8
                    step: 40000
    appear_embed:
        enabled: True
        dim: 8

data:
    type: projects.neuralangelo.data
    root: .../data/TNT/Meetingroom
    num_images: 371  # The number of training images.
    train:
        image_size: [835,1500]
        batch_size: 16
        subset:
    val:
        image_size: [300,540]
        batch_size: 16
        subset: 16
        max_viz_samples: 16

trainer:
    grad_accum_iter: 8

optim:
    sched:
        warm_up_end: 40000
        two_steps: [2400000,3200000]

Do you have any intuition on how the result should look like at this iteration and what could be wrong with my experiment?

DerKleineLi commented 1 year ago

It seems that the bounding sphere has a wrong size in my case: image Will try to fix that and update with the result. Thanks for the hint in the latest README :)

mli0603 commented 1 year ago

Hi @DerKleineLi

Good news! Glad you have found the problem.

However, I wonder if you used preprocess_tnt.sh to generate the bounding regions. I would expect the bounding region to work fairly well for T&T, otherwise it is a bug and I need to fix it.

DerKleineLi commented 1 year ago

Thanks for the comment @mli0603!

For Meetingroom I manually downloaded the images and colmap reconstruction from the official release of TNT, placed the file as described in DATA_PROCESSING.md, and ran

python projects/neuralangelo/scripts/convert_tnt_to_json.py --tnt_path .../data/TNT

For the visualization I ran projects/neuralangelo/scripts/visualize_colmap.ipynb with

colmap_path = ".../data/TNT/Meetingroom"
json_fname = f"{colmap_path}/transforms.json"

Hope it helps!

mli0603 commented 1 year ago

Hi @DerKleineLi

If you look at the notebook, there is a read_scale=0.25 which changes the visualization. If you set read_scale=1.0, then it will use the default bounding radius.

There is no bug in terms of pre-processing and I think your issue is somewhere else.

Willyzw commented 1 year ago

Hi, I can confirm the observed degraded results.

Sofar I have trained the Neuralangelo model on Barn and Courthouse scenes. Training logs on Wandb can be accessed here: https://api.wandb.ai/links/willyzw/whf9tz8o

I mostly followed the default configeration with the minor change to batch size: 1 for Courthouse, and 2 for Barn. As I thought this may have some impace. Any insights would be helpful!

MulinYu commented 1 year ago

Hi @MulinYu

Thank you for your interest in the project!

we recently found a bug in T&T preprocessing. It was fixed in 62483fe. Can you check if this fixes your issue?

Hello,

I've utilized the updated script to produce a new JSON file for the Barn and subsequently retrained it using a batch size of 16. However, the outcome is still unsatisfactory after 70,000 iterations. Could you pleaes share your JSON files for TNT? If I continue to obtain bad results using your JSON files, it would indicate that the preprocessing stage is not the source of the issue. new-barn-normal new-barn

Best, Mulin

mli0603 commented 1 year ago

Thanks for this info. We are looking into this issue. We will update once we pin down the error.

hugoycj commented 1 year ago

@DerKleineLi @mli0603 Some updates for Meeting Room training results at 60K steps. It's : rgb_render_60000_ba0ad47bb40320ae50b3 media_images_val_vis_normal_60000_819909930e528679b6a5

I use BlenderNeuralangelo to manually regenerate the bounding_box and sphere. I have also modified the ray_num from 512 to 2048 to accelerate the training.

DerKleineLi commented 1 year ago

@hugoycj Amazing result! Could you share more details about your pipeline? Did you use the official colmap reconstruction of TNT? How large is the bounding box? Have you modified other hyperparameters? Thank you!

hugoycj commented 1 year ago

Yeap. I follow the same steps in provided convert_tnt_to_json.py scripts, followed by calculating bounding box using Blender. One interesting thing I found for Meetingroom scene is the sparse point clouds are too noisy which make the auto-generated bounding box around 2~3 times larger than the real scene, which may make it difficult to converge I guess.

Here are the updated params in transforms.json under datasets/tanks_and_temples/Meetingroom dir. I think you could replace these params directly in your JSON file and visualize it again.

"aabb_scale": 16.0,
  "aabb_range": [
    [
      -7.6214215713955715,
      6.522051862701991
    ],
    [
      -3.6237776073426597,
      4.891977370864412
    ],
    [
      -11.059166537834281,
      9.311011724588386
    ]
  ],
  "sphere_center": [
    -0.5496848543467903,
    0.6340998817608761,
    -0.8740774066229475
  ],
  "sphere_radius": 11.449168401589803
hugoycj commented 1 year ago

I have also tried the colmap scripts using my own data. The results are stills far from good maybe due to the training time, but we could observe some fine detailed region is generated.

image
hugoycj commented 1 year ago

One simple suggestion for training on custom data: try to capture video in ultrawide mode, this could may the pose estimation more accurate and may be beneficial to neuralangelo training.

hugoycj commented 1 year ago

@mli0603

image

I am curious about the dramatic drop in PSNR during 20k and 30k when training on MeetingRoom. Do you have any idea why this happen?

DerKleineLi commented 1 year ago

@mli0603 @hugoycj Thank you for your help. I think I found the issue: The generated transforms.json by the convert_tnt_to_json.py points the file path to the uncalibrated images. Instead of "file_path": "images/xxx.jpg", it should be "file_path": "dense/images/xxx.jpg" Having this fixed I'm getting good result at iteration 15k with batch size 16, grad_accum_iter=1 on 1 gpu: image image

chenhsuanlin commented 1 year ago

@DerKleineLi thanks for this! It's indeed a bug on our side. We will fix it in the coming days.

MulinYu commented 1 year ago

@mli0603 @hugoycj Thank you for your help. I think I found the issue: The generated transforms.json by the convert_tnt_to_json.py points the file path to the uncalibrated images. Instead of "file_path": "images/xxx.jpg", it should be "file_path": "dense/images/xxx.jpg" Having this fixed I'm getting good result at iteration 15k with batch size 16, grad_accum_iter=1 on 1 gpu: image image

Dear @DerKleineLi

You are right, the generated json file let the dataloader read the images before undistortion. I also get better results on Barn after changing the image path to 'dense/images'.

Thanks alot. Best. Mulin

lxxue commented 1 year ago

I have also tried the colmap scripts using my own data. The results are stills far from good maybe due to the training time, but we could observe some fine detailed region is generated. image

Hi @hugoycj,

May I ask if you get good results on your own data? I also tried with one sequence randomly captured with an iPhone but the results seem to be worse than what I can get from MonoSDF.

chenhsuanlin commented 1 year ago

Update: we have updated the data structure of the COLMAP artifacts in the latest main branch to avoid confusion. We are now storing the raw images in images_raw and the undistorted images in images (the final images for Neuralangelo). The dense folder has also been removed. Please see the new data preparation for details on the new structure.

mli0603 commented 1 year ago

The tanks and temples preprocessing script has been updated (see https://github.com/NVlabs/neuralangelo/commit/54172deb11660fd6c64d727e600f0201b002347a) to fix the above bug and reflect the latest changes of data preprocessing.

DerKleineLi commented 1 year ago

Hi all, I still need some help with the TNT dataset. Here's the result I get using the latest commit: image It's much blurrier than the paper's result. If anyone have reached a better result, I would appreciate it if you share me your settings and results. Thanks in advance!

@hugoycj I wonder if you have trained the Meeetingroom till the end, how was the result? I also followed your pipeline a bit and here's what I got at iteration 60k: image Here's the config:

_parent_: projects/neuralangelo/configs/base.yaml

model:
    object:
        sdf:
            mlp:
                inside_out: True   # True for Meetingroom.
            encoding:
                coarse2fine:
                    init_active_level: 8
    appear_embed:
        enabled: True
        dim: 8
    render:
        rand_rays: 2048

data:
    type: projects.neuralangelo.data
    root: .../data/TNT/Meetingroom
    num_images: 371  # The number of training images.
    train:
        image_size: [835,1500]
        batch_size: 1
        subset:
    val:
        image_size: [300,540]
        batch_size: 1
        subset: 1
        max_viz_samples: 16

The result is very different from yours. Could you check whether the config fits yours? If so, the problem could lie in the data preprocessing, could you then share the complete transforms.json file? Only the global parameters are not enough because as you change the bbox, the transform of each frame would also change.

Another question regarding the blender plugin: There are limits as we change the size of the bbox. The image shows the smallest possible bbox of the plugin. I wonder if this is the same as in your case and have you edited the script to allow smaller bboxes? image