NVlabs / neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
https://research.nvidia.com/labs/dir/neuralangelo/
Other
4.38k stars 388 forks source link

extract mesh failed #99

Closed qq297110281 closed 1 year ago

qq297110281 commented 1 year ago

i got this error when i try to extract mesh with the epoch_00836_iteration_000220000_checkpoint.pt ,

torchrun --nproc_per_node=1 projects/neuralangelo/scripts/extract_mesh.py --config=logs/lighter/config.yaml --checkpoint=logs/lighter/epoch_00836_iteration_000220000_checkpoint.pt --output_file=lighter_tx.ply --resolution=1080 --block_res=128 --textured Running mesh extraction with 1 GPUs. Setup trainer. Using random seed 0 model parameter count: 366,702,732 Initialize model weights using type: none, gain: None Using random seed 0 Allow TensorFloat32 operations on supported devices Traceback (most recent call last): File "projects/neuralangelo/scripts/extract_mesh.py", line 105, in main() File "projects/neuralangelo/scripts/extract_mesh.py", line 64, in main trainer.checkpointer.load(args.checkpoint, load_opt=False, load_sch=False) File "/home/nlt/neuralangelo/imaginaire/trainers/base.py", line 626, in load state_dict = torch.load(checkpoint_path, map_location=lambda storage, loc: storage) File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/serialization.py", line 797, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/serialization.py", line 283, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 250721) of binary: /home/nlt/anaconda3/envs/neuralangelo/bin/python Traceback (most recent call last): File "/home/nlt/anaconda3/envs/neuralangelo/bin/torchrun", line 10, in sys.exit(main()) File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/nlt/anaconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

projects/neuralangelo/scripts/extract_mesh.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-01_19:55:12 host : iZj6cew1z55iqfsasz6corZ rank : 0 (local_rank: 0) exitcode : 1 (pid: 250721) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
mli0603 commented 1 year ago

The error seems to come from RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory. Did you provide the correct path to extract mesh?

qq297110281 commented 1 year ago

i think i have figured it out, the checkpoint should be some kind of broken because my disk had not empty space. the path is correct.

mli0603 commented 1 year ago

Ah I see. I’ll close this for now. Feel free to reopen it if you think there is something else going on.