Cadene / vqa.pytorch

Visual Question Answering in Pytorch
717 stars 178 forks source link

Extracting image features from Visual Genome fails with multi-gpu #5

Closed ahmedmagdiosman closed 7 years ago

ahmedmagdiosman commented 7 years ago

Hello,

I'm trying to use multiple gpus to speed up data extraction. I'm getting this error with the following command: CUDA_VISIBLE_DEVICES=0,2 python extract.py --dataset vgenome --dir_data data/vgenome --data_split train --mode att

Warning: shape_att=(108249, 2048, 14, 14)
Traceback (most recent call last):
  File "extract.py", line 157, in <module>
    main()
  File "extract.py", line 87, in main
    extract(data_loader, model, path_file, args.mode)
  File "extract.py", line 121, in extract
    output_att = model(input_var)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 46, in parallel_apply
    raise output
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 25, in _worker
    output = module(*input, **kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aosman/vqa/vqa.pytorch/vqa/models/convnets.py", line 62, in <lambda>
    model.forward = lambda x: forward_resnet(convnet, x)
  File "/home/aosman/vqa/vqa.pytorch/vqa/models/convnets.py", line 26, in forward_resnet
    x = self.conv1(x)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/functional.py", line 40, in conv2d
    return f(input, weight, bias)
RuntimeError: tensors are on different GPUs

Using CUDA_VISIBLE_DEVICES with only 1 gpu works fine. I've looked in to the code and it seems you guys had multi-gpu extraction in mind. Do you have a clue why this happens?

Cadene commented 7 years ago

To be fair, I've never extracted features with multiple GPUs ^^

I don't know how to fix this error right now. If you find a fix, don't hesitate to send a Pull Request. Thanks :)

ahmedmagdiosman commented 7 years ago

Thanks for letting me know :)

I've used it for a single gpu for now, but I'll definitely look into it as I don't think it's a hard fix.