bebeal / mipnerf-pytorch

A re-implementation of mip-NeRF in PyTorch
146 stars 15 forks source link

About multi-GPUs training #1

Open SimonCK666 opened 2 years ago

SimonCK666 commented 2 years ago

Could u update code for multi-GPUs training?

I've tried change it for multi-GPUs training, but got some bugs I've changed train.py as follows:

if torch.cuda.device_count() > 1:
        print("Let's use", torch.cuda.device_count(), "GPUs!")
        # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
        model = nn.DataParallel(model)

    model.to(config.device)
    model.train()

But when I trained this model, I've got this bug:

Traceback (most recent call last):
  File "train.py", line 118, in <module>
    train_model(config)
  File "train.py", line 76, in train_model
    loss_val, psnr = loss_func(comp_rgb, pixels, rays.lossmult.to(config.device))
  File "/home/hyang/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hyang/Desktop/mipnerf-pytorch/loss.py", line 13, in forward
    mse = (mask * ((rgb - target[..., :3]) ** 2)).sum() / mask.sum()
RuntimeError: The size of tensor a (1024) must match the size of tensor b (2048) at non-singleton dimension 0
massyzs commented 1 year ago

Hi, did you fix this? I so, could you share your solution? Mine also told me data not on the same device (cuda:0 and cuda:1)