OSError: Could not find compatible tinycudann extension for compute capability 61.

samghafari commented 11 months ago

Hi All,

I am having an issue on training, I was hoping someone could shed some light on this. on windows 11, wsl2, docker, rtx 1080, cuda 11.8

I get the following error on training:

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 22 G /Xwayland N/A | | 0 N/A N/A 23 G /Xwayland N/A | +---------------------------------------------------------------------------------------+ root@c19fd782183c:/workspace/neuralangelo# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Feb__7_19:32:13_PST_2023 Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 root@c19fd782183c:/workspace/neuralangelo# EXPERIMENT=lego_example root@c19fd782183c:/workspace/neuralangelo# GROUP=example_group root@c19fd782183c:/workspace/neuralangelo# NAME=lego root@c19fd782183c:/workspace/neuralangelo# CONFIG=projects/neuralangelo/configs/custom/lego.yaml root@c19fd782183c:/workspace/neuralangelo# GPUS=1 root@c19fd782183c:/workspace/neuralangelo# torchrun --nproc_per_node=${GPUS} train.py \

--logdir=logs/${GROUP}/${NAME} \
--config=${CONFIG} \
--show_pbar
(Setting affinity with NVML failed, skipping...) Training with 1 GPUs. Using random seed 0 Make folder logs/example_group/lego

checkpoint:

save_epoch: 9999999999

save_iter: 20000

save_latest_iter: 9999999999

save_period: 9999999999

strict_resume: True

cudnn:

benchmark: True

deterministic: False

data:

name: dummy

num_images: None

num_workers: 4

preload: True

readjust:

center: [0.0, 0.0, 0.0]

scale: 1.0

root: datasets/lego_ds2

train:

batch_size: 2

image_size: [800, 800]

subset: None

type: projects.neuralangelo.data

use_multi_epoch_loader: True

val:

batch_size: 2

image_size: [300, 300]

max_viz_samples: 16

subset: 4

image_save_iter: 9999999999

inference_args:

local_rank: 0

logdir: logs/example_group/lego

logging_iter: 9999999999999

max_epoch: 9999999999

max_iter: 500000

metrics_epoch: None

metrics_iter: None

model:

appear_embed:

dim: 8

enabled: False

background:

enabled: True

encoding:

levels: 10

type: fourier

encoding_view:

levels: 3

type: spherical

mlp:

activ: relu

activ_density: softplus

activ_density_params:

activ_params:

hidden_dim: 256

hidden_dim_rgb: 128

num_layers: 8

num_layers_rgb: 2

skip: [4]

skip_rgb: []

view_dep: True

white: False

object:

rgb:

encoding_view:

levels: 3

type: spherical

mlp:

activ: relu_

activ_params:

hidden_dim: 256

num_layers: 4

skip: []

weight_norm: True

mode: idr

s_var:

anneal_end: 0.1

init_val: 3.0

sdf:

encoding:

coarse2fine:

enabled: True

init_active_level: 4

step: 5000

hashgrid:

dict_size: 22

dim: 8

max_logres: 11

min_logres: 5

range: [-2, 2]

levels: 16

type: hashgrid

gradient:

mode: numerical

taps: 4

mlp:

activ: softplus

activ_params:

beta: 100

geometric_init: True

hidden_dim: 256

inside_out: False

num_layers: 1

out_bias: 0.5

skip: []

weight_norm: True

render:

num_sample_hierarchy: 4

num_samples:

background: 32

coarse: 64

fine: 16

rand_rays: 512

stratified: True

type: projects.neuralangelo.model

nvtx_profile: False

optim:

fused_opt: False

params:

lr: 0.001

weight_decay: 0.01

sched:

gamma: 10.0

iteration_mode: True

step_size: 9999999999

two_steps: [300000, 400000]

type: two_steps_with_warmup

warm_up_end: 5000

type: AdamW

pretrained_weight: None

source_filename: projects/neuralangelo/configs/custom/lego.yaml

speed_benchmark: False

test_data:

name: dummy

num_workers: 0

test:

batch_size: 1

is_lmdb: False

roots: None

type: imaginaire.datasets.images

timeout_period: 9999999

trainer:

amp_config:

backoff_factor: 0.5

enabled: False

growth_factor: 2.0

growth_interval: 2000

init_scale: 65536.0

ddp_config:

find_unused_parameters: False

static_graph: True

depth_vis_scale: 0.5

ema_config:

beta: 0.9999

enabled: False

load_ema_checkpoint: False

start_iteration: 0

grad_accum_iter: 1

image_to_tensorboard: False

init:

gain: None

type: none

loss_weight:

curvature: 0.0005

eikonal: 0.1

render: 1.0

type: projects.neuralangelo.trainer

validation_iter: 5000

wandb_image_iter: 10000

wandb_scalar_iter: 100 cudnn benchmark: True cudnn deterministic: False Setup trainer. Using random seed 0 Traceback (most recent call last): File "train.py", line 104, in main() File "train.py", line 79, in main trainer = get_trainer(cfg, is_inference=False, seed=args.seed) File "/workspace/neuralangelo/imaginaire/trainers/utils/get_trainer.py", line 32, in get_trainer trainer = trainer_lib.Trainer(cfg, is_inference=is_inference, seed=seed) File "/workspace/neuralangelo/projects/neuralangelo/trainer.py", line 26, in init super().init(cfg, is_inference=is_inference, seed=seed) File "/workspace/neuralangelo/projects/nerf/trainers/base.py", line 28, in init super().init(cfg, is_inference=is_inference, seed=seed) File "/workspace/neuralangelo/imaginaire/trainers/base.py", line 50, in init self.model = self.setup_model(cfg, seed=seed) File "/workspace/neuralangelo/imaginaire/trainers/base.py", line 116, in setup_model lib_model = importlib.import_module(cfg.model.type) File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 848, in exec_module File "", line 219, in _call_with_frames_removed File "/workspace/neuralangelo/projects/neuralangelo/model.py", line 21, in from projects.neuralangelo.utils.modules import NeuralSDF, NeuralRGB, BackgroundNeRF File "/workspace/neuralangelo/projects/neuralangelo/utils/modules.py", line 16, in import tinycudann as tcnn File "/usr/local/lib/python3.8/dist-packages/tinycudann/init.py", line 9, in from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network, Encoding File "/usr/local/lib/python3.8/dist-packages/tinycudann/modules.py", line 59, in raise EnvironmentError(f"Could not find compatible tinycudann extension for compute capability {system_compute_capability}.") OSError: Could not find compatible tinycudann extension for compute capability 61. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 450) of binary: /usr/bin/python Traceback (most recent call last): File "/usr/local/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==2.1.0a0+fe05266', 'console_scripts', 'torchrun')()) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-01_23:16:11 host : c19fd782183c rank : 0 (local_rank: 0) exitcode : 1 (pid: 450) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ root@c19fd782183c:/workspace/neuralangelo#

praveen5733 commented 11 months ago

+1

Did you manage to find any solution?

Magnetar99 commented 9 months ago

+1

djetshu commented 4 months ago

It appears that this issue is specific to tiny-cuda-nn. A similar problem was previously resolved, as discussed in this issue thread: https://github.com/NVlabs/tiny-cuda-nn/issues/341#issuecomment-1651814335

For GPUs with Compute Capability 6.1, such as the GTX 1080 you should configure your environment variables as follows:

export CUDA_ARCHITECTURES="61"
export CMAKE_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES}
export TCNN_CUDA_ARCHITECTURES=${CUDA_ARCHITECTURES}
export TORCH_CUDA_ARCH_LIST="6.1"
export FORCE_CUDA="1"

pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

This configuration has been tested and works on a GTX 1070 GPU (Compute Capability 6.1).

NVlabs / neuralangelo

OSError: Could not find compatible tinycudann extension for compute capability 61. #132

train.py FAILED