I've tried change it for multi-GPUs training, but got some bugs
I've changed train.py as follows:
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(config.device)
model.train()
But when I trained this model, I've got this bug:
Traceback (most recent call last):
File "train.py", line 118, in <module>
train_model(config)
File "train.py", line 76, in train_model
loss_val, psnr = loss_func(comp_rgb, pixels, rays.lossmult.to(config.device))
File "/home/hyang/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hyang/Desktop/mipnerf-pytorch/loss.py", line 13, in forward
mse = (mask * ((rgb - target[..., :3]) ** 2)).sum() / mask.sum()
RuntimeError: The size of tensor a (1024) must match the size of tensor b (2048) at non-singleton dimension 0
Could u update code for multi-GPUs training?
I've tried change it for multi-GPUs training, but got some bugs I've changed
train.py
as follows:But when I trained this model, I've got this bug: