No CUDA GPUs are available

gwyllo commented 1 month ago

I can successfully install all dependencies following the instruction sets in the repo. I have also tried the same using the docker image.

I get the following error when trying to run the run_train_infer.sh script using either install from scratch or the docker image:

(instantsplat) root@C.13193616:/InstantSplat$ bash scripts/run_train_infer.sh
========= santorini: Dust3r_coarse_geometric_initialization =========
... loading model from submodules/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
instantiating : AsymmetricCroCo3DStereo(enc_depth=24, dec_depth=12, enc_embed_dim=1024, dec_embed_dim=768, enc_num_heads=16, dec_num_heads=12, pos_embed='RoPE100', patch_embed_cls='PatchEmbedDust3R', img_size=(512, 512), head_type='dpt', output_mode='pts3d', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), landscape_only=False)
<All keys matched successfully>
Traceback (most recent call last):
  File "/InstantSplat/./coarse_init_infer.py", line 53, in <module>
    model = AsymmetricCroCo3DStereo.from_pretrained(model_path).to(device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
           ^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/cuda/__init__.py", line 319, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
========= santorini: Train: jointly optimize pose =========
Optimizing ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/
Output folder: ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/
Traceback (most recent call last):
  File "/InstantSplat/./train_joint.py", line 279, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args)
  File "/InstantSplat/./train_joint.py", line 60, in training
    scene = Scene(dataset, gaussians, opt=args, shuffle=True)                                                                      
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/InstantSplat/scene/__init__.py", line 49, in __init__
    assert False, "Could not recognize scene type!"
           ^^^^^
AssertionError: Could not recognize scene type!
========= santorini: Render interpolated pose & output video =========
Looking for config file in ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/cfg_args
Config file found: ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/cfg_args
Rendering ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/
Traceback (most recent call last):
  File "/InstantSplat/./render_by_interp.py", line 143, in <module>
    render_sets(
  File "/InstantSplat/./render_by_interp.py", line 98, in render_sets
    save_interpolate_pose(dataset.model_path, iteration, args.n_views)
  File "/InstantSplat/./render_by_interp.py", line 33, in save_interpolate_pose
    org_pose = np.load(model_path + f"pose/pose_{iter}.npy")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/numpy/lib/_npyio_impl.py", line 455, in load
    fid = stack.enter_context(open(os.fspath(file), "rb"))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: './output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/pose/pose_1000.npy'

nvcc --version suggests cuda is installed correctly

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

and a simple script to check if cuda is accessible to pytorch also seems to work as expected within this conda environment:

import torch
print (torch.cuda.is_available())
print(torch.version.cuda)
print(torch.cuda.device_count())

(instantsplat) root@C.13193616:/$ python torchCheck.py
True
12.1
1

Any idea on the underlying cause of this issue?

dafnianagno commented 1 month ago

I think you have to change the GPU_ID (3rd line of the bash script) to 0, since you have only one GPU.

smart4654154 commented 2 weeks ago

I can successfully install all dependencies following the instruction sets in the repo. I have also tried the same using the docker image.

I get the following error when trying to run the run_train_infer.sh script using either install from scratch or the docker image:

(instantsplat) root@C.13193616:/InstantSplat$ bash scripts/run_train_infer.sh
========= santorini: Dust3r_coarse_geometric_initialization =========
... loading model from submodules/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
instantiating : AsymmetricCroCo3DStereo(enc_depth=24, dec_depth=12, enc_embed_dim=1024, dec_embed_dim=768, enc_num_heads=16, dec_num_heads=12, pos_embed='RoPE100', patch_embed_cls='PatchEmbedDust3R', img_size=(512, 512), head_type='dpt', output_mode='pts3d', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), landscape_only=False)
<All keys matched successfully>
Traceback (most recent call last):
  File "/InstantSplat/./coarse_init_infer.py", line 53, in <module>
    model = AsymmetricCroCo3DStereo.from_pretrained(model_path).to(device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
           ^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/torch/cuda/__init__.py", line 319, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
========= santorini: Train: jointly optimize pose =========
Optimizing ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/
Output folder: ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/
Traceback (most recent call last):
  File "/InstantSplat/./train_joint.py", line 279, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args)
  File "/InstantSplat/./train_joint.py", line 60, in training
    scene = Scene(dataset, gaussians, opt=args, shuffle=True)                                                                      
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/InstantSplat/scene/__init__.py", line 49, in __init__
    assert False, "Could not recognize scene type!"
           ^^^^^
AssertionError: Could not recognize scene type!
========= santorini: Render interpolated pose & output video =========
Looking for config file in ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/cfg_args
Config file found: ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/cfg_args
Rendering ./output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/
Traceback (most recent call last):
  File "/InstantSplat/./render_by_interp.py", line 143, in <module>
    render_sets(
  File "/InstantSplat/./render_by_interp.py", line 98, in render_sets
    save_interpolate_pose(dataset.model_path, iteration, args.n_views)
  File "/InstantSplat/./render_by_interp.py", line 33, in save_interpolate_pose
    org_pose = np.load(model_path + f"pose/pose_{iter}.npy")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/instantsplat/lib/python3.11/site-packages/numpy/lib/_npyio_impl.py", line 455, in load
    fid = stack.enter_context(open(os.fspath(file), "rb"))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: './output/infer/sora/santorini/3_views_1000Iter_1xPoseLR/pose/pose_1000.npy'

nvcc --version suggests cuda is installed correctly

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

and a simple script to check if cuda is accessible to pytorch also seems to work as expected within this conda environment:

import torch
print (torch.cuda.is_available())
print(torch.version.cuda)
print(torch.cuda.device_count())

(instantsplat) root@C.13193616:/$ python torchCheck.py
True
12.1
1

Any idea on the underlying cause of this issue?

hi,have you solve this issue?i have met same issue

NVlabs / InstantSplat

No CUDA GPUs are available #29