Closed rushi-the-neural-arch closed 2 years ago
hi @rushi-the-neural-arch. was away for a bit hence the lack of response. I'll look into this over the next few days. Hopefully it's a simple issue!
Looks like they are using torch.load
to load something without any regards to devices (see here)...
You'll most likely have to handle all the placement manually :-/
Yeah I was afraid about the same, manual handling is a pain! No worries I am closing this issue as of now but let me know if there's any different way to handle this, thanks!
Describe the bug
This is not a bug exactly but I need some context regarding how to implement this. I am trying to implement a novel loss function, ref - https://github.com/gfxdisp/mdf. The gist of it is to use a pre-trained neural architecture for low-level vision tasks like Image Denoising, SR etc. So, we would be using the discriminator (CNN) itself as a loss function here (The CNN accepts input the model's perdiction and gives out some metrics). But the issue is I couldn't implement it in a compatible way with Stoke which leads me to the standard error:
Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! .....
. Can you please suggest me a way to mitigate this or how to efficiently handle this task??To Reproduce
Steps to reproduce the behavior:
Ran config as -
env CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 Stoke-DDP.py --projectName "Stoke-4K-2X-DDP" --batchSize 18 --nEpochs 2 --lr 1e-3 --weight_decay 1e-4 --grad_clip 0.1
Error produced is -
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__cudnn_convolution)
Environment:
Thanks!