RuntimeError: CUDA error: out of memory

srikar2097 commented 6 years ago

@chaoyuaw Thank you for sharing your code but for the setting in train.sh all hierarchies (0,1 and 2) the code goes into CUDA error: out of memory. Little analysis revealed memory required was around 25-30GB. My GPU's have 12GB memory only.

How did you get these hierarchy settings to run in GPU's?

./train.sh 0
Namespace(batch_size=16, bits=16, checkpoint_iters=100, clip=0.5, decoder_fuse_level=1, distance1=6, distance2=6, encoder_fuse_level=1, eval='data/eval', eval_batch_size=1, eval_iters=40, eval_mv='data/eval_mv', fuse_encoder=True, gamma=0.5, gpus='0,1', iterations=10, load_iter=None, load_model_name=None, lr=0.00025, max_train_iters=100, model_dir='model', num_crops=2, out_dir='output', patch=64, save_codes=False, save_model_name='demo', save_out_img=True, schedule='50000,60000,70000,80000,90000', shrink=2, stack=True, train='data/train', train_mv='data/train_mv', v_compress=True, warp=True)

Creating loader for data/train...
56 images loaded.
    distance=6/6
Loader for 56 images (4 batches) created.
    Encoder fuse level: 1
    Decoder fuse level: 1
Using GPUs [0, 1].
Traceback (most recent call last):
  File "train.py", line 195, in <module>
    (output, deco_h1, deco_h2, deco_h3, deco_h4) = decoder(codes, deco_h1, deco_h2, deco_h3, deco_h4, warped_unet_output1, warped_unet_output2)
  File "/home/ec2-user/anaconda3/envs/vidcompress_py363/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/ags_drive/users/code/pytorch-vcii/network.py", line 172, in forward
    hidden4 = self.rnn4(x, hidden4)
  File "/home/ec2-user/anaconda3/envs/vidcompress_py363/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/drive/users/code/pytorch-vcii/modules/conv_rnn.py", line 48, in forward
    gates  = self.conv_ih(input) + self.conv_hh(hx)
  File "/home/ec2-user/anaconda3/envs/vidcompress_py363/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/vidcompress_py363/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

./train.sh 2
Namespace(batch_size=16, bits=8, checkpoint_iters=100, clip=0.5, decoder_fuse_level=1, distance1=1, distance2=2, encoder_fuse_level=1, eval='data/eval', eval_batch_size=1, eval_iters=40, eval_mv='data/eval_mv', fuse_encoder=True, gamma=0.5, gpus='0,1', iterations=10, load_iter=None, load_model_name=None, lr=0.00025, max_train_iters=100, model_dir='model', num_crops=2, out_dir='output', patch=64, save_codes=False, save_model_name='demo', save_out_img=True, schedule='50000,60000,70000,80000,90000', shrink=2, stack=True, train='data/train', train_mv='data/train_mv', v_compress=True, warp=True)

Creating loader for data/train...
448 images loaded.
    distance=1/2
Loader for 448 images (28 batches) created.
    Encoder fuse level: 1
    Decoder fuse level: 1
Using GPUs [0, 1].
....
....
RuntimeError: CUDA error: out of memory


./train.sh 1
Namespace(batch_size=16, bits=16, checkpoint_iters=100, clip=0.5, decoder_fuse_level=3, distance1=3, distance2=3, encoder_fuse_level=2, eval='data/eval', eval_batch_size=1, eval_iters=40, eval_mv='data/eval_mv', fuse_encoder=True, gamma=0.5, gpus='0,1', iterations=10, load_iter=None, load_model_name=None, lr=0.00025, max_train_iters=100, model_dir='model', num_crops=2, out_dir='output', patch=64, save_codes=False, save_model_name='demo', save_out_img=True, schedule='50000,60000,70000,80000,90000', shrink=2, stack=True, train='data/train', train_mv='data/train_mv', v_compress=True, warp=True)

Creating loader for data/train...
112 images loaded.
    distance=3/3
Loader for 112 images (7 batches) created.
    Encoder fuse level: 2
    Decoder fuse level: 3
Using GPUs [0, 1].
....
....
RuntimeError: CUDA error: out of memory

chaoyuaw commented 6 years ago

Hi @srikar2097 , thanks for your questions. Sorry but I'm not sure what caused this. The videos used in demo should take < 10Gb memory, and I was able to use only one GPU. Any chance some other programs might be using GPU memory at the same time?

srikar2097 commented 6 years ago

@chaoyuaw nope. i suspected this and stopped all other programs using GPU. Also, note I have not changed anything from your code. same data (downloaded from the drive link). Are you able to replicate the memory issue?

srikar2097 commented 6 years ago

@chaoyuaw okay i did one more experiment. this time downgraded pytorch from 0.4.1.post2 (current stable) to 0.3.0.post4 (suggested by you) and it runs! See below for stats. Its almost at the brink of memory error but it runs (consumed 11061MB out of 11441MB).

I suspect there are some internal changes within pytorch which makes it take lot more memory for your operations than previous version.

System stats: [0] Tesla K80 | 69'C, 100 %, 159 / 149 W | 11061 / 11441 MB | ec2-user:python/114497(11048M)

Another interesting thing, is your code setup to run multi-gpu? Giving more than 1 GPU in command line has no effect. It always runs only on 1 GPU.

chaoyuaw commented 6 years ago

Hi @srikar2097, Glad that you found the issue quickly. Yes, I think the current code doesn't handle multi-GPU correctly. Would you mind considering contributing a PR if you fixed the bug? Thanks :)

srikar2097 commented 6 years ago

@chaoyuaw I haven't yet implemented multi-GPU :) also there was no bug. just downgrade to your suggested pytorch version.

but inorder to replicate your reported results, where do you suggest to begin? What set of 75K Kinetics videos were used? etc.

chaoyuaw commented 6 years ago

The video ids I used for train/val/test are available at https://drive.google.com/drive/folders/1MOLuoGDE6lZnmXLJHTUNkJtfY1l2sZSC?usp=sharing

Please send me an email (cywu@cs.utexas.edu) if you need pre-trained models. Thanks!

mmend175 commented 4 years ago

@chaoyuaw any chance you have the code for the implementation the GOP structure and also code for generating the motion estimation? Thanks!

chaoyuaw / pytorch-vcii

RuntimeError: CUDA error: out of memory #2