I am not able to use your code with multi GPU training using nn.(DataParallel) error. The code is running fine when I do -
model = torch.nn.DataParallel(model.cuda()) ----> model = model.cuda()
Have you tried using the code with DataParallel enabled?
Log snippets -
output = net(feat, coord, offset, batch, neighbor_idx)
File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
I am not able to use your code with multi GPU training using nn.(DataParallel) error. The code is running fine when I do - model = torch.nn.DataParallel(model.cuda()) ----> model = model.cuda()
Have you tried using the code with DataParallel enabled?
Log snippets -
output = net(feat, coord, offset, batch, neighbor_idx) File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/prem/anaconda3/envs/alpha/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0.