RuntimeError: expected device cuda:1 but got device cuda:0

littlespray commented 4 years ago

Hi everyone! I am a beginner in Pytorch and Computer vision. I just cloned the source code(without installing the correlation module, I thought it is not necessary for training?) and began training using the command:

CUDA_VISIBLE_DEVICES=1,2 python main.py \
--maxdisp 256 --fac 1 \
--database /ssd2/ \
--logname chairs-0 \
--savemodel /ssd1/models/vcn/ \
--epochs 1000 --stage chairs --ngpus 2

and I got the following traceback:

File "main.py", line 258, in train
      vis['AEPE'] = realEPE(output[0].detach(), flowl0.permute(0,3,1,2).detach(),mask,sparse=False)
File "/home/VCN/utils/multiscaleloss.py", line 86, in realEPE
      return EPE(upsampled_output, target,mask, sparse, mean=True)
File "/home/VCN/utils/multiscaleloss.py", line 12, in EPE
      EPE_map = torch.norm(target_flow-input_flow,2,1)

Thanks very much for any help.

gengshan-y commented 4 years ago

It seems the ground-truth flow ("flowl0") and the the predicted flow ("output[0]") are placed on different GPUs. Not sure why it happens.

You could add

output[0] = output[0].detach().cpu()
flowl0 = flowl0.detach().cpu()

before

vis['AEPE'] = realEPE(output[0].detach(), flowl0.permute(0,3,1,2).detach(),mask,sparse=False)

to transfer data to cpu first. Let me know if the error does not go away.

littlespray commented 4 years ago

Thank you so much! I don't know the reason but after I tried the initial commands several times, it worked automatically! Anyway, thank you all the same for your solution.

gengshan-y / VCN

RuntimeError: expected device cuda:1 but got device cuda:0 #12