Closed varunnrao closed 6 years ago
Simply removing the two calls to .cuda in preprocess-images.py
should work.
That does not work. We did try that. There is an issue with this part of the code which expects CUDA.
torch.utils.data.DataLoader(
dataset,
batch_size=config.preprocess_batch_size,
num_workers=config.data_workers,
shuffle=False,
pin_memory=True,
)
We get an error saying no NVIDIA device found.
So, we tried settingpin_memory=False
. However this did not work as well.
out = net(imgs)
failed since there mismatch in image sizes.
We would like to replicate your results. Is it possible for you to commit 2 new working codes of preprocess-image.py
andtrain.py
?
with pin_memory=True
and after removing .cuda
, this was the error log
Traceback (most recent call last):
File "preprocess-images.py", line 73, in <module>
main()
File "preprocess-images.py", line 62, in main
for ids, imgs in loader:
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in __next__
return self._process_next_batch(batch)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 62, in _pin_memory_loop
batch = pin_memory_batch(batch)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
return [pin_memory_batch(sample) for sample in batch]
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 117, in pin_memory_batch
return batch.pin_memory()
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 82, in pin_memory
return type(self)().set_(storage.pin_memory()).view_as(self)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/storage.py", line 83, in pin_memory
allocator = torch.cuda._host_allocator()
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 220, in _host_allocator
_lazy_init()
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
_check_driver()
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
with pin_memory=False
, this was the error log
Traceback (most recent call last):
File "preprocess-images.py", line 73, in <module>
main()
File "preprocess-images.py", line 64, in main
out = net(imgs)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "preprocess-images.py", line 25, in forward
self.model(x)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/resnet.py", line 151, in forward
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 53, in forward
return F.linear(input, self.weight, self.bias)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 553, in linear
return torch.addmm(bias, input, weight.t())
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 924, in addmm
return cls._blas(Addmm, args, False)
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 920, in _blas
return cls.apply(*(tensors + (alpha, beta, inplace)))
File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/blas.py", line 26, in forward
matrix1, matrix2, out=output)
RuntimeError: size mismatch, m1: [64 x 8192], m2: [2048 x 1000] at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/TH/generic/THTensorMath.c:1293
please do note that we have imported the following model for resnet
since your command on line 12 did not work
import torchvision.models.resnet as caffe_resnet
The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train.
Okay. Thanks
On 5 January 2018 at 18:49, Yan Zhang notifications@github.com wrote:
The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Cyanogenoid/pytorch-vqa/issues/7#issuecomment-355552710, or mute the thread https://github.com/notifications/unsubscribe-auth/AY_7IZrqCg9fK09ttXvHkiADB01wV7Wmks5tHiFLgaJpZM4RSnZB .
Is there a way to convert the preprocess-images.py to a version that doesnt require CUDA?