0it [00:00, ?it/s]
Traceback (most recent call last):
File "run.py", line 80, in
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/kushagra/monkey-net/transfer.py", line 112, in transfer
out = transfer_one(generator, kp_detector, source_image, driving_video, transfer_params)
File "/home/kushagra/monkey-net/transfer.py", line 68, in transfer_one
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/monkey-net/transfer.py", line 68, in
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 122, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/kushagra/monkey-net/sync_batchnorm/replicate.py", line 65, in replicate
modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate
return replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, params)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
I also put all the models to device 1 [For eg. generator.to(opt.device_ids[1])], in the hope that it will free up space for tensors in device 0 (otherwise I would get a CUDA out of memory error).
Use predefined train-test split. Transfer... /usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, **kwds)
0it [00:00, ?it/s] Traceback (most recent call last): File "run.py", line 80, in
transfer(config, generator, kp_detector, opt.checkpoint, log_dir, dataset)
File "/home/kushagra/monkey-net/transfer.py", line 112, in transfer
out = transfer_one(generator, kp_detector, source_image, driving_video, transfer_params)
File "/home/kushagra/monkey-net/transfer.py", line 68, in transfer_one
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/monkey-net/transfer.py", line 68, in
kp_driving = cat_dict([kp_detector(driving_video[:, :, i:(i + 1)]) for i in range(d)], dim=1)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 122, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/kushagra/monkey-net/sync_batchnorm/replicate.py", line 65, in replicate
modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate
return replicate(module, device_ids)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, params)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/kushagra/.local/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
I understand that I need to put all the input tensors on the 0 device. But not sure exactly how to do that, I tried some ways from https://discuss.pytorch.org/t/how-to-solve-the-problem-of-runtimeerror-all-tensors-must-be-on-devices-0/15198/5 however that did not work.
I also put all the models to device 1 [For eg. generator.to(opt.device_ids[1])], in the hope that it will free up space for tensors in device 0 (otherwise I would get a CUDA out of memory error).
Running the model on 2 RTX 2080 with CUDA 10