RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

ekiwi111 commented 2 years ago

RTX 3090
CUDA 11.8
Ubuntu 22.04.1 LTS

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 44%   55C    P0   105W / 350W |   3234MiB / 24576MiB |     12%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2358      G   /usr/lib/xorg/Xorg               2143MiB |
|    0   N/A  N/A      2714      G   /usr/bin/gnome-shell              239MiB |
|    0   N/A  N/A      4569      G   ...187800556795193677,131072      643MiB |
|    0   N/A  N/A      4892      G   ...AAAAAAAAA= --shared-files      108MiB |
+-----------------------------------------------------------------------------+

>>> torch.version.cuda
'11.7'
>>> torch.__version__
'1.13.0.dev20221006'

Input:

$ python main.py --text "a hamburger" --workspace trial -O

Output:

Namespace(text='a hamburger', O=True, O2=False, test=False, save_mesh=False, workspace='trial', guidance='stable-diffusion', seed=0, iters=15000, lr=0.001, ckpt='latest', cuda_ray=True, max_steps=1024, num_steps=256, upsample_steps=0, update_extra_interval=16, max_ray_batch=4096, albedo_iters=15000, bg_radius=1.4, density_thresh=10, fp16=True, backbone='grid', w=128, h=128, bound=1, dt_gamma=0, min_near=0.1, radius_range=[1.0, 1.5], fovy_range=[40, 70], dir_text=True, angle_overhead=30, angle_front=30, lambda_entropy=0.0001, lambda_orient=0.01, gui=False, W=800, H=800, radius=3, fovy=60, light_theta=60, light_phi=0, max_spp=1)
NeRFNetwork(
  (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(6119864, 2) gridtype=tiled align_corners=False
  (sigma_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=32, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=64, bias=True)
      (2): Linear(in_features=64, out_features=4, bias=True)
    )
  )
  (encoder_bg): FreqEncoder: input_dim=2 degree=6 output_dim=26
  (bg_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=26, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=3, bias=True)
    )
  )
)
[INFO] successfully loaded hugging face user token!
[INFO] loading stable diffusion...
[INFO] loaded stable diffusion!
[INFO] Trainer: ngp | 2022-10-07_16-34-36 | cuda | fp16 | trial
[INFO] #parameters: 12248183
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
==> Start Training trial Epoch 1, lr=0.010000 ...
  0% 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/genesst/Developer/stable-dreamfusion/main.py", line 144, in <module>
    trainer.train(train_loader, valid_loader, max_epoch)
  File "/home/genesst/Developer/stable-dreamfusion/nerf/utils.py", line 453, in train
    self.train_one_epoch(train_loader)
  File "/home/genesst/Developer/stable-dreamfusion/nerf/utils.py", line 673, in train_one_epoch
    pred_rgbs, pred_ws, loss = self.train_step(data)
  File "/home/genesst/Developer/stable-dreamfusion/nerf/utils.py", line 355, in train_step
    loss_guidance = self.guidance.train_step(text_z, pred_rgb)
  File "/home/genesst/Developer/stable-dreamfusion/nerf/sd.py", line 97, in train_step
    w = (1 - self.scheduler.alphas_cumprod[t]).to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
  0% 0/100 [00:00<?, ?it/s]

How can I fix this?

ashawkey commented 2 years ago

@genesst Hi, what is the torch version? I guess this can be fixed by upgrading pytorch. I'll make it more compatible to lower versions later.

ekiwi111 commented 2 years ago

@genesst Hi, what is the torch version? I guess this can be fixed by upgrading pytorch. I'll make it more compatible to lower versions later.

>>> torch.__version__
'1.13.0.dev20221006'

Also added it to the original post

ashawkey commented 2 years ago

@genesst Oh I haven't tested 1.13 too... Could you try w = (1 - self.scheduler.alphas_cumprod[t.cpu()]).to(self.device)?

ekiwi111 commented 2 years ago

@genesst Oh I haven't tested 1.13 too... Could you try w = (1 - self.scheduler.alphas_cumprod[t.cpu()]).to(self.device)?

That worked!

ashawkey / stable-dreamfusion

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #11