NVlabs / neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
https://research.nvidia.com/labs/dir/neuralangelo/
Other
4.33k stars 387 forks source link

Known issue: textureless mesh extraction & bad intermediate checkpoints #83

Closed chenhsuanlin closed 1 year ago

chenhsuanlin commented 1 year ago

+@mli0603. This is related to multiple reported issues (#62, #75, and potentially others).

There seems to be an issue with the .state_dict() method of torch.nn.Module classes, which could be a PyTorch bug. Specifically, there seems to be a certain probability where the extracted state dict might not match the (subset of) module parameters, causing the saved checkpoints to be partially corrupted. When this happens in the final layers of the neural SDF/RGB networks, it might result in bad geometry shape (#75) or monotonically gray color (sigmoid(0)=0.5) for the object (#62).

This seems to be reproducible with (using the toy Lego example, pre-processed)

torchrun --nproc_per_node=1 train.py \
    --logdir=logs/debug/lego --show_pbar \
    --config=projects/neuralangelo/configs/custom/lego.yaml \
    --data.root=datasets/lego_ds2 \
    --max_iter=20000 --checkpoint.save_iter=1000 \
    --model.object.sdf.encoding.coarse2fine.step=200 \
    --model.object.sdf.encoding.hashgrid.dict_size=19 \
    --optim.sched.warm_up_end=200 \
    --optim.sched.two_steps=[12000,16000]

At iteration 2000, the checkpointed parameter module.neural_sdf.mlp.linear_sdf.weight would be corrupted.

mli0603 commented 1 year ago

This bug potentially is pytorch-version related. We have pushed a commit in (https://github.com/NVlabs/neuralangelo/commit/c91af8d5098c858df8e8dfa35fba8666d314782b) to fix the issue. Please let us know if you still run into the same problem.