NVlabs / neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
https://research.nvidia.com/labs/dir/neuralangelo/
Other
4.33k stars 387 forks source link

how to use single gpu train and extract mesh #69

Closed nanhui69 closed 1 year ago

nanhui69 commented 1 year ago

how to do that ????

iam-machine commented 1 year ago

Short question, long answer 😅

nanhui69 commented 1 year ago

could you give an example?

chenhsuanlin commented 1 year ago

Please see the README for instructions. Thanks!

nanhui69 commented 1 year ago

Please see the README for instructions. Thanks! when i use this ,wrong burning: EXPERIMENT=VID_20230821_144722 NAME=youyic_obj GROUP=VID_20230821_obj obj_point=epoch_07142_iteration_000300000_checkpoint CHECKPOINT=logs/${GROUP}/${NAME}/${obj_point}.pt OUTPUT_MESH=${obj_point}.ply CONFIG=projects/neuralangelo/configs/custom/${EXPERIMENT}.yaml RESOLUTION=2048 BLOCK_RES=128 CUDA_VISIBLE_DEVICES=1 python projects/neuralangelo/scripts/extract_mesh.py \ --config=${CONFIG} \ --logdir=logs/${GROUP}/${NAME} \ --checkpoint=${CHECKPOINT} \ --output_file=${OUTPUT_MESH} \ --resolution=${RESOLUTION} \ --block_res=${BLOCK_RES} \

=====>

Traceback (most recent call last): File "projects/neuralangelo/scripts/extract_mesh.py", line 99, in main() File "projects/neuralangelo/scripts/extract_mesh.py", line 56, in main init_dist(cfg.local_rank, rank=-1, world_size=-1) File "/home/neuralangelo/imaginaire/utils/distributed.py", line 32, in init_dist dist.init_process_group(backend=backend, init_method='env://', **kwargs) File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 754, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 236, in _env_rendezvous_handler rank = int(_get_env_or_raise("RANK")) File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 221, in _get_env_or_raise raise _env_error(env_var) ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

nanhui69 commented 1 year ago

so how to run single-gpu without “torchrun --nproc_per_node=${GPUS} ” before extract.py

chenhsuanlin commented 1 year ago

You can add --single_gpu. However, torchrun is the recommended way, and you can just set GPUS=1 with that.

nanhui69 commented 1 year ago

You can add --single_gpu. However, torchrun is the recommended way, and you can just set GPUS=1 with that. when change it to: EXPERIMENT=VID_20230821_144722 NAME=youyic_obj GROUP=VID_20230821_obj obj_point=epoch_07142_iteration_000300000_checkpoint CHECKPOINT=logs/${GROUP}/${NAME}/${obj_point}.pt OUTPUT_MESH=${obj_point}.ply CONFIG=projects/neuralangelo/configs/custom/${EXPERIMENT}.yaml RESOLUTION=2048 BLOCK_RES=128 CUDA_VISIBLE_DEVICES=1 python projects/neuralangelo/scripts/extract_mesh.py \ --config=${CONFIG} \ --logdir=logs/${GROUP}/${NAME} \ --checkpoint=${CHECKPOINT} \ --output_file=${OUTPUT_MESH} \ --single_gpu \ --resolution=${RESOLUTION} \ --block_res=${BLOCK_RES} \ , this error happen: /work/conda/envs/mmyolo/lib/python3.8/site-packages/tinycudann-1.7-py3.8-linux-x86_64.egg/tinycudann/modules.py:53: UserWarning: tinycudann was built for lower compute capability (86) than the system's (89). Performance may be suboptimal. warnings.warn(f"tinycudann was built for lower compute capability ({cc}) than the system's ({system_compute_capability}). Performance may be suboptimal.") model parameter count: 366,707,148 Initialize model weights using type: none, gain: None Using random seed 0 Allow TensorFloat32 operations on supported devices Loading checkpoint (local): logs/VID_20230821_obj/youyic_obj/epoch_07142_iteration_000300000_checkpoint.pt

  • Loading the model... Done with loading the checkpoint. Extracting surface at resolution 2048 2048 2048 Traceback (most recent call last): File "projects/neuralangelo/scripts/extract_mesh.py", line 99, in main() File "projects/neuralangelo/scripts/extract_mesh.py", line 83, in main mesh = extract_mesh(sdf_func=lambda x: -trainer.model_module.neural_sdf.sdf(x), File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/neuralangelo/projects/neuralangelo/utils/mesh.py", line 38, in extract_mesh dist.all_gather_object(mesh_blocks_gather, mesh_blocks) File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1871, in all_gather_object current_device = _get_pg_device(group) File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_pg_device if _check_for_nccl_backend(group): File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1807, in _check_for_nccl_backend pg = group or _get_default_group() File "/work/conda/envs/mmyolo/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 584, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
nanhui69 commented 1 year ago

how to address it?? @chenhsuanlin

chenhsuanlin commented 1 year ago

This should have already been fixed in the latest main. Can you pull and try again? Otherwise, please use torchrun as I suggested.

nanhui69 commented 1 year ago

This should have already been fixed in the latest main. Can you pull and try again? Otherwise, please use torchrun as I suggested. how to Visualize the ply file ??