run without CUDA - Githubissues

varunnrao commented 6 years ago

Is there a way to convert the preprocess-images.py to a version that doesnt require CUDA?

Cyanogenoid commented 6 years ago

Simply removing the two calls to .cuda in preprocess-images.py should work.

varunnrao commented 6 years ago

That does not work. We did try that. There is an issue with this part of the code which expects CUDA.

torch.utils.data.DataLoader(
        dataset,
        batch_size=config.preprocess_batch_size,
        num_workers=config.data_workers,
        shuffle=False,
        pin_memory=True,
    )

We get an error saying no NVIDIA device found. So, we tried settingpin_memory=False. However this did not work as well.

out = net(imgs) failed since there mismatch in image sizes.

We would like to replicate your results. Is it possible for you to commit 2 new working codes of preprocess-image.py andtrain.py?

varunnrao commented 6 years ago

with pin_memory=True and after removing .cuda, this was the error log


Traceback (most recent call last):
  File "preprocess-images.py", line 73, in <module>
    main()
  File "preprocess-images.py", line 62, in main
    for ids, imgs in loader:
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in __next__
    return self._process_next_batch(batch)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 62, in _pin_memory_loop
    batch = pin_memory_batch(batch)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
    return [pin_memory_batch(sample) for sample in batch]
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
    return [pin_memory_batch(sample) for sample in batch]
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 117, in pin_memory_batch
    return batch.pin_memory()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 82, in pin_memory
    return type(self)().set_(storage.pin_memory()).view_as(self)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/storage.py", line 83, in pin_memory
    allocator = torch.cuda._host_allocator()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 220, in _host_allocator
    _lazy_init()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
    _check_driver()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

varunnrao commented 6 years ago

with pin_memory=False, this was the error log

Traceback (most recent call last):
  File "preprocess-images.py", line 73, in <module>
    main()
  File "preprocess-images.py", line 64, in main
    out = net(imgs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "preprocess-images.py", line 25, in forward
    self.model(x)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/resnet.py", line 151, in forward
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 53, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 553, in linear
    return torch.addmm(bias, input, weight.t())
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 924, in addmm
    return cls._blas(Addmm, args, False)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 920, in _blas
    return cls.apply(*(tensors + (alpha, beta, inplace)))
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/blas.py", line 26, in forward
    matrix1, matrix2, out=output)
RuntimeError: size mismatch, m1: [64 x 8192], m2: [2048 x 1000] at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/TH/generic/THTensorMath.c:1293

varunnrao commented 6 years ago

please do note that we have imported the following model for resnet since your command on line 12 did not work import torchvision.models.resnet as caffe_resnet

Cyanogenoid commented 6 years ago

The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train.

varunnrao commented 6 years ago

Okay. Thanks

On 5 January 2018 at 18:49, Yan Zhang notifications@github.com wrote:

The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Cyanogenoid/pytorch-vqa/issues/7#issuecomment-355552710, or mute the thread https://github.com/notifications/unsubscribe-auth/AY_7IZrqCg9fK09ttXvHkiADB01wV7Wmks5tHiFLgaJpZM4RSnZB .

Cyanogenoid / pytorch-vqa

run without CUDA #7