extract mesh failed，'list' object has no attribute 'vertices'

NVlabs / neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)

https://research.nvidia.com/labs/dir/neuralangelo/

Other

4.38k stars 388 forks source link

extract mesh failed，'list' object has no attribute 'vertices' #87

Closed Islaster closed 1 year ago

Islaster commented 1 year ago

SOFTWARE:

Windows 10
ubuntu-wsl
Python Anaconda

HARDWARE:

GeForce 4070 ti
208 ram
SUPERMICRO MOTHERBOARD

mli0603 commented 1 year ago

Hi @Islaster

It looks like no meshes are extracted.

Are you on the latest main branch?
If so, can you provide your config and checkpoint so I can test it on my end?

Islaster commented 1 year ago

just did a pull, what did you mean by config and checkpoint?

Islaster commented 1 year ago

after the pull i re-ran the extraction and i got the same error

Islaster commented 1 year ago

CHECKPOINT=logs/example_group/example_name/epoch_00000_iteration_000000010_checkpoint.pt CONFIG=logs/example_group/example_name/config.yaml

Islaster commented 1 year ago

the issue you sent to is for training im having problems with the mesh extraction

mli0603 commented 1 year ago

Hi @Islaster

Yes you found the right files! Could you share the files so I can download and test them?
The issue is related to loading the checkpoint, which affects mesh extraction as well.

Islaster commented 1 year ago

it's not letting me copy it into the comment so I'm going to send google drive links

Islaster commented 1 year ago

config file: https://drive.google.com/file/d/1-CJBpAIHi-4xlW-SJ1KvkQOaCE5c34H2/view?usp=sharing checkpoint: https://drive.google.com/file/d/1PReVRmim17iAkiXysjTzUPjJiKoOqvp3/view?usp=sharing

mli0603 commented 1 year ago

Thanks @Islaster

I can confirm that this is a bug potentially related to #83 where your checkpoint saved is corrupted and all the SDF values predicted are NaN. Therefore, no meshes are found.

Islaster commented 1 year ago

so how do i fix it?

mli0603 commented 1 year ago

Can you provide more information on the following?

I see in your config file that your save_iter is 20000. But your checkpoint is saved at iteration 10. How did you save this checkpoint?
Can you provide the set of commands you used for training?
Can you provide your pytorch version?

Islaster commented 1 year ago

EXPERIMENT=template GROUP=example_group NAME=example_name CONFIG=projects/neuralangelo/configs/custom/${EXPERIMENT}.yaml GPUS=1
torchrun --nproc_per_node=${GPUS} train.py \ --logdir=logs/${GROUP}/${NAME} \ --config=${CONFIG} \ --show_pbar

Islaster commented 1 year ago

pytorch version 2.0.0 and I changed the max iter in base.yaml

Islaster commented 1 year ago

I can send you my whole git clone if that would help clear things up

mli0603 commented 1 year ago

Hi @Islaster

I was able to run your entire pipeline with torch version 1.12.1. Here is the mesh extracted at iteration 10 using the Lego dataset.

Can you try again with my torch version?

Islaster commented 1 year ago

will do

Islaster commented 1 year ago

what version of anaconda do you use?

mli0603 commented 1 year ago

I am using conda 4.10.1. We have also pushed a latest update of the code that could potentially fix the issue. Can you pull the latest version and try it out on your end?

Islaster commented 1 year ago

I'm getting conflict errors when downloading Pytorch 1.12.1 could it have something to do with the anaconda envs files for neuralangelo

lie12huo commented 1 year ago

I had the same problem, is it because of too few iterations for training? I changed the configuration items save_epoch, max_epoch and max_iter in config_base.yaml to 3. Are there any minimum requirements for these values?

mli0603 commented 1 year ago

Hi @lie12huo

No, it is not because training is too few iterations. It is likely the saved checkpoint has all 0s for the model weights. Can you check?
Are you running the latest code? If not, we recently pushed a commit that fixes this issue, which potentially is torch-version related.

lie12huo commented 1 year ago

@mli0603 I used the latest code for training and model extraction, but the data preprocessing did not use the latest version of the code. Does this affect the result? How do I check if there is a problem with the saved checkpoint? By the way, I am currently using the test data toy_example.mov.

lie12huo commented 1 year ago

@mli0603 I reprocessed the data for training, generated new checkpoints, and exported the mesh, but the result was poor and there was no texture. Is this because 50 epochs are too few? Or is it related to my adjustment of parameters dict_size: 21, dim: 4?

mli0603 commented 1 year ago

Hi @lie12huo

It's likely undertrained. You can also take a look at the colab for how to obtain good results.