How can i use the quick_simclr_2node.yaml get the RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 (while checking arguments for cudnn_convolution)

Instructions To Reproduce the 🐛 Bug:

I only use the quick_simclr_2node get the error about that:

  File "/mnt/cache/user/miniconda/envs/ivssl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cache/user/miniconda/envs/ivssl/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/mnt/cache/user/miniconda/envs/ivssl/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 (while checking arguments for cudnn_convolution)

How can I resolve this question?

FYI:

python3 tools/run_distributed_engines.py \
    hydra.verbose=true \
    config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
    config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
    config.DATA.TRAIN.DATA_PATHS=["/mnt/cache/liwei1/data/imagenet-1k/train"] \
    config=test/integration_test/quick_simclr_2node \
    config.DISTRIBUTED.RUN_ID=my_ip:port \
    config.CHECKPOINT.DIR="./checkpoints" \
    config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=10 \
    config.DATA.NUM_DATALOADER_WORKERS=1 \
    config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true

facebookresearch / vissl

How can i use the quick_simclr_2node.yaml get the RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 (while checking arguments for cudnn_convolution) #476

Instructions To Reproduce the 🐛 Bug: