Incorrect device ID issues.

Gwondori commented 1 year ago

Hi. I'm asking for help because I'm currently having trouble running the command in 'demo'.

Here my environment:

OS: WSL2 Ubuntu (Windows 11) Cuda Version(nvidia-smi): +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.47 Driver Version: 531.68 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+

Installed cuda and cudnn. Checked pytorch used cuda.

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> quit()

Command:

python run.py --resume_from g_p3d_car_pretrained --inv_export_demo_sample --gpus 4 --batch_size 16

Error Message:

/home/euler/Works/algorithms/nerf-from-image/models/stylegan.py:555: SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert (bs % ng == 0,
Autodetected p3d_car dataset
Namespace(gpus=4, dataset='p3d_car', xid='', resolution=128, batch_size=16, run_inversion=True, resume_from='g_p3d_car_pretrained', root_path='.', data_path='datasets', iterations=300000, lr_g=0.0025, lr_d=0.002, dual_discriminator=False, dual_discriminator_l1=False, dual_discriminator_mse=False, r1=5.0, tv=0.5, entropy=0.05, eikonal=0.1, supervise_alpha=False, conditional_pose=True, augment_p=0, augment_ada=False, ada_target=0.6, path_length_regularization=False, perturb_poses=0, clip_gradient_norm=100.0, fine_sampling=True, attention_values=10, use_sdf=True, use_encoder=False, use_viewdir=False, use_class=False, latent_dim=512, disable_stylegan_noise=True, inv_use_testset=False, inv_use_imagenet_testset=False, inv_use_separate=False, inv_loss='vgg', inv_gain_z=5, inv_steps=None, inv_no_split=False, inv_no_optimize_pose=False, inv_train_coord_only=False, inv_encoder_only=False, inv_export_demo_sample=True, inv_manual_input_path=None, coord_resume_from=None)
Experiment name g_p3d_car_res128_bs16_d512_lrg_0.0025_lrd_0.002_r1_5.0_entropy_0.05_tv_0.5_fine_sdf_eik0.1_attn10_noalpha_pose_nonoise
Saving checkpoints to ./gan_checkpoints/g_p3d_car_res128_bs16_d512_lrg_0.0025_lrd_0.002_r1_5.0_entropy_0.05_tv_0.5_fine_sdf_eik0.1_attn10_noalpha_pose_nonoise
Saving tensorboard logs to ./gan_logs/g_p3d_car_res128_bs16_d512_lrg_0.0025_lrd_0.002_r1_5.0_entropy_0.05_tv_0.5_fine_sdf_eik0.1_attn10_noalpha_pose_nonoise
Saving inversion reports to ./reports
Attempting to load latest checkpoint...
Resuming from manual checkpoint ./gan_checkpoints/g_p3d_car_pretrained/checkpoint_latest.pth
Checkpoint iteration: 300000
Loading data...
100%|██████████████████████████████████████████████████████████| 296/296 [00:35<00:00,  8.31it/s]
100%|██████████████████████████████████████████████████████████| 148/148 [00:12<00:00, 12.21it/s]
100%|██████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  9.42it/s]
Loaded test split with shape torch.Size([219, 128, 128, 4])
Loaded train split with shape torch.Size([9444, 128, 128, 4])
Loaded train_eval split with shape torch.Size([4722, 128, 128, 4])
Initializing Inception network, tensorflow weights...
Evaluating training FID on 4722 images
Evaluating test set on 219 images
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/lpips/weights/v0.1/vgg.pth
Traceback (most recent call last):
  File "/home/euler/Works/algorithms/nerf-from-image/run.py", line 636, in <module>
    parallel_model = nn.DataParallel(
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 145, in __init__
    _check_balance(self.device_ids)
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 25, in _check_balance
    dev_props = _get_devices_properties(device_ids)
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/_utils.py", line 678, in _get_devices_properties
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/_utils.py", line 678, in <listcomp>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/_utils.py", line 659, in _get_device_attr
    return get_member(torch.cuda)
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/_utils.py", line 678, in <lambda>
    return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
  File "/home/euler/.local/share/virtualenvs/nerf-from-image-BXRQer9q/lib/python3.10/site-packages/torch/cuda/__init__.py", line 398, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

dariopavllo commented 1 year ago

Hi, this looks like a setup issue to me! How many GPUs are there in your machine? Looks like you specified 4 GPUs, but you might have fewer.

Gwondori commented 1 year ago

Hi, this looks like a setup issue to me! How many GPUs are there in your machine? Looks like you specified 4 GPUs, but you might have fewer.

Yes. I have one GPU. So I gave 1 for the -gpus option and it worked.

google-research / nerf-from-image

Incorrect device ID issues. #9