NVlabs / neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
https://research.nvidia.com/labs/dir/neuralangelo/
Other
4.31k stars 387 forks source link

OSError: Could not find compatible tinycudann extension for compute capability 61. #132

Open samghafari opened 11 months ago

samghafari commented 11 months ago

Hi All,

I am having an issue on training, I was hoping someone could shed some light on this. on windows 11, wsl2, docker, rtx 1080, cuda 11.8

I get the following error on training:

root@c19fd782183c:/workspace/neuralangelo# nvidia-smi Sun Oct 1 23:15:18 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.112 Driver Version: 537.42 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1080 On | 00000000:01:00.0 On | N/A | | 0% 38C P0 44W / 198W | 794MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 22 G /Xwayland N/A | | 0 N/A N/A 23 G /Xwayland N/A | +---------------------------------------------------------------------------------------+ root@c19fd782183c:/workspace/neuralangelo# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Feb__7_19:32:13_PST_2023 Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 root@c19fd782183c:/workspace/neuralangelo# EXPERIMENT=lego_example root@c19fd782183c:/workspace/neuralangelo# GROUP=example_group root@c19fd782183c:/workspace/neuralangelo# NAME=lego root@c19fd782183c:/workspace/neuralangelo# CONFIG=projects/neuralangelo/configs/custom/lego.yaml root@c19fd782183c:/workspace/neuralangelo# GPUS=1 root@c19fd782183c:/workspace/neuralangelo# torchrun --nproc_per_node=${GPUS} train.py \

--logdir=logs/${GROUP}/${NAME} \
--config=${CONFIG} \
--show_pbar

(Setting affinity with NVML failed, skipping...) Training with 1 GPUs. Using random seed 0 Make folder logs/example_group/lego

  • checkpoint:
  • save_epoch: 9999999999
  • save_iter: 20000
  • save_latest_iter: 9999999999
  • save_period: 9999999999
  • strict_resume: True
  • cudnn:
  • benchmark: True
  • deterministic: False
  • data:
  • name: dummy
  • num_images: None
  • num_workers: 4
  • preload: True
  • readjust:
  • center: [0.0, 0.0, 0.0]
  • scale: 1.0
  • root: datasets/lego_ds2
  • train:
  • batch_size: 2
  • image_size: [800, 800]
  • subset: None
  • type: projects.neuralangelo.data
  • use_multi_epoch_loader: True
  • val:
  • batch_size: 2
  • image_size: [300, 300]
  • max_viz_samples: 16
  • subset: 4
  • image_save_iter: 9999999999
  • inference_args:
  • local_rank: 0
  • logdir: logs/example_group/lego
  • logging_iter: 9999999999999
  • max_epoch: 9999999999
  • max_iter: 500000
  • metrics_epoch: None
  • metrics_iter: None
  • model:
  • appear_embed:
  • dim: 8
  • enabled: False
  • background:
  • enabled: True
  • encoding:
  • levels: 10
  • type: fourier
  • encoding_view:
  • levels: 3
  • type: spherical
  • mlp:
  • activ: relu
  • activ_density: softplus
  • activ_density_params:
  • activ_params:
  • hidden_dim: 256
  • hidden_dim_rgb: 128
  • num_layers: 8
  • num_layers_rgb: 2
  • skip: [4]
  • skip_rgb: []
  • view_dep: True
  • white: False
  • object:
  • rgb:
  • encoding_view:
  • levels: 3
  • type: spherical
  • mlp:
  • activ: relu_
  • activ_params:
  • hidden_dim: 256
  • num_layers: 4
  • skip: []
  • weight_norm: True
  • mode: idr
  • s_var:
  • anneal_end: 0.1
  • init_val: 3.0
  • sdf:
  • encoding:
  • coarse2fine:
  • enabled: True
  • init_active_level: 4
  • step: 5000
  • hashgrid:
  • dict_size: 22
  • dim: 8
  • max_logres: 11
  • min_logres: 5
  • range: [-2, 2]
  • levels: 16
  • type: hashgrid
  • gradient:
  • mode: numerical
  • taps: 4
  • mlp:
  • activ: softplus
  • activ_params:
  • beta: 100
  • geometric_init: True
  • hidden_dim: 256
  • inside_out: False
  • num_layers: 1
  • out_bias: 0.5
  • skip: []
  • weight_norm: True
  • render:
  • num_sample_hierarchy: 4
  • num_samples:
  • background: 32
  • coarse: 64
  • fine: 16
  • rand_rays: 512
  • stratified: True
  • type: projects.neuralangelo.model
  • nvtx_profile: False
  • optim:
  • fused_opt: False
  • params:
  • lr: 0.001
  • weight_decay: 0.01
  • sched:
  • gamma: 10.0
  • iteration_mode: True
  • step_size: 9999999999
  • two_steps: [300000, 400000]
  • type: two_steps_with_warmup
  • warm_up_end: 5000
  • type: AdamW
  • pretrained_weight: None
  • source_filename: projects/neuralangelo/configs/custom/lego.yaml
  • speed_benchmark: False
  • test_data:
  • name: dummy
  • num_workers: 0
  • test:
  • batch_size: 1
  • is_lmdb: False
  • roots: None
  • type: imaginaire.datasets.images
  • timeout_period: 9999999
  • trainer:
  • amp_config:
  • backoff_factor: 0.5
  • enabled: False
  • growth_factor: 2.0
  • growth_interval: 2000
  • init_scale: 65536.0
  • ddp_config:
  • find_unused_parameters: False
  • static_graph: True
  • depth_vis_scale: 0.5
  • ema_config:
  • beta: 0.9999
  • enabled: False
  • load_ema_checkpoint: False
  • start_iteration: 0
  • grad_accum_iter: 1
  • image_to_tensorboard: False
  • init:
  • gain: None
  • type: none
  • loss_weight:
  • curvature: 0.0005
  • eikonal: 0.1
  • render: 1.0
  • type: projects.neuralangelo.trainer
  • validation_iter: 5000
  • wandb_image_iter: 10000
  • wandb_scalar_iter: 100 cudnn benchmark: True cudnn deterministic: False Setup trainer. Using random seed 0 Traceback (most recent call last): File "train.py", line 104, in main() File "train.py", line 79, in main trainer = get_trainer(cfg, is_inference=False, seed=args.seed) File "/workspace/neuralangelo/imaginaire/trainers/utils/get_trainer.py", line 32, in get_trainer trainer = trainer_lib.Trainer(cfg, is_inference=is_inference, seed=seed) File "/workspace/neuralangelo/projects/neuralangelo/trainer.py", line 26, in init super().init(cfg, is_inference=is_inference, seed=seed) File "/workspace/neuralangelo/projects/nerf/trainers/base.py", line 28, in init super().init(cfg, is_inference=is_inference, seed=seed) File "/workspace/neuralangelo/imaginaire/trainers/base.py", line 50, in init self.model = self.setup_model(cfg, seed=seed) File "/workspace/neuralangelo/imaginaire/trainers/base.py", line 116, in setup_model lib_model = importlib.import_module(cfg.model.type) File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 848, in exec_module File "", line 219, in _call_with_frames_removed File "/workspace/neuralangelo/projects/neuralangelo/model.py", line 21, in from projects.neuralangelo.utils.modules import NeuralSDF, NeuralRGB, BackgroundNeRF File "/workspace/neuralangelo/projects/neuralangelo/utils/modules.py", line 16, in import tinycudann as tcnn File "/usr/local/lib/python3.8/dist-packages/tinycudann/init.py", line 9, in from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network, Encoding File "/usr/local/lib/python3.8/dist-packages/tinycudann/modules.py", line 59, in raise EnvironmentError(f"Could not find compatible tinycudann extension for compute capability {system_compute_capability}.") OSError: Could not find compatible tinycudann extension for compute capability 61. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 450) of binary: /usr/bin/python Traceback (most recent call last): File "/usr/local/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==2.1.0a0+fe05266', 'console_scripts', 'torchrun')()) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

    train.py FAILED

    Failures:

    ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-01_23:16:11 host : c19fd782183c rank : 0 (local_rank: 0) exitcode : 1 (pid: 450) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ root@c19fd782183c:/workspace/neuralangelo#
praveen5733 commented 11 months ago

+1

Did you manage to find any solution?

Magnetar99 commented 9 months ago

+1

djetshu commented 4 months ago

It appears that this issue is specific to tiny-cuda-nn. A similar problem was previously resolved, as discussed in this issue thread: https://github.com/NVlabs/tiny-cuda-nn/issues/341#issuecomment-1651814335

For GPUs with Compute Capability 6.1, such as the GTX 1080 you should configure your environment variables as follows:

export CUDA_ARCHITECTURES="61"
export CMAKE_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES}
export TCNN_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES}
export TORCH_CUDA_ARCH_LIST="6.1"
export FORCE_CUDA="1"

pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

This configuration has been tested and works on a GTX 1070 GPU (Compute Capability 6.1).