dearleiii / PIRM-2018-SISR-Challenge

Super Resolution
https://www.pirm2018.org/PIRM-SR.html
2 stars 0 forks source link

RuntimeError: $ Torch: not enough memory: you tried to allocate 12GB. Buy new RAM! at #5

Closed dearleiii closed 6 years ago

dearleiii commented 6 years ago

File "scatter_plots.py", line 134, in trainNet(approximator, batch_size = 100, n_epochs = 5, learning_rate = 0.001) File "scatter_plots.py", line 98, in trainNet outputs = net(inputs)[0] File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, kwargs) File "/home/home2/leichen/SuperResolutor/Approx_discrim/apxm.py", line 56, in forward x = self.main(x) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: $ Torch: not enough memory: you tried to allocate 12GB. Buy new RAM! at /pytorch/aten/src/TH/THGeneral.c:218 leichen@gpu-compute7>

dearleiii commented 6 years ago

You can check with the following command if you are on linux system:

watch -n 1 'free -m'

dearleiii commented 6 years ago

Running large CNNs in CPU is especially memory demanding. If you have a GPU, use that with cuDNN instead.

dearleiii commented 6 years ago

Can not CUDA_VISIBLE_DEVICES=0 Can not export PATH Can not change /bin/bash Can not install htop

dearleiii commented 6 years ago

leichen@gpu-compute4$ python3 scatter_plots.py cuda:2 0 Traceback (most recent call last): File "scatter_plots.py", line 36, in approximator.to(device) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in to return self._apply(lambda t: t.to(device)) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 182, in _apply param.data = fn(param.data) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in return self._apply(lambda t: t.to(device)) RuntimeError: CUDA error (10): invalid device ordinal

dearleiii commented 6 years ago

change leichen@gpu-compute4$ python3 scatter_plots.py cuda:0 0

|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 182, in _apply param.data = fn(param.data) File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in return self._apply(lambda t: t.to(device)) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58 leichen@gpu-compute4$ nvidia-smi

dearleiii commented 6 years ago

GPU out of memory issue

dearleiii commented 6 years ago

Exp4. generally speaking, the pattern is:

use .cuda() on any input batches/tensors

use .cuda() on your network module, which will hold your network, like:

class MyModel(nn.Module): def init(self): self.layer1 = nn. … self.layer2 = nn. … … etc …

then just do:

model = MyModel() model.cuda()

dearleiii commented 6 years ago

Basically it is just one line to use DataParallel:

net = torch.nn.DataParallel(model, device_ids=[0, 1, 2]) output = net(input_var) Just wrap your model with DataParallel and call the returned net on your data. The device_ids parameter specifies the used GPUs.

dearleiii commented 6 years ago

You can either run your script by setting CUDA_VISIBLE_DEVICES like

CUDA_VISIBLE_DEVICES=1 python myscript.py

link