arunos728 / MotionSqueeze

Official PyTorch Implementation of MotionSqueeze, ECCV 2020
BSD 2-Clause "Simplified" License
139 stars 16 forks source link

Out of memory error with Nvidia RTX 3080 #13

Closed rogerhcheng closed 3 years ago

rogerhcheng commented 3 years ago

I am trying to run the training on something-something-v1 with a Nvidia RTX 3080, and the latest PyTorch Nvidia Docker image, and I get this out of memory error. It is reproducible each time I run.

Any ideas? I know that I am not using the same exact configuration as the original author, but I don't think I can downgrade, because the RTX 3080 doesn't support CUDA 9.0.

Thanks in advance.

pretrained_parts: finetune group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1 group: first_conv_bias has 0 params, lr_mult: 2, decay_mult: 0 group: normal_weight has 29 params, lr_mult: 1, decay_mult: 1 group: normal_bias has 1 params, lr_mult: 2, decay_mult: 0 group: BN scale/shift has 60 params, lr_mult: 1, decay_mult: 0 group: custom_ops has 0 params, lr_mult: 1, decay_mult: 1 group: lr5_weight has 0 params, lr_mult: 1, decay_mult: 1 group: lr10_bias has 0 params, lr_mult: 2, decay_mult: 0 100 No BN layer Freezing. Traceback (most recent call last): File "../main_something.py", line 442, in main() File "../main_something.py", line 211, in main temperature = train(train_loader, model, criterion, optimizer, epoch) File "../main_something.py", line 273, in train output = model(input_var, temperature) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(*input, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward return self.module(*inputs[0], *kwargs[0]) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(input, kwargs) File "/Documents/MotionSqueeze/models.py", line 354, in forward base_out = self.base_model(input_var, temperature) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(*input, kwargs) File "/Documents/MotionSqueeze/resnet_TSM.py", line 430, in forward flow_1, match_v = self.flow_computation(x, temperature=temperature) File "/Documents/MotionSqueeze/resnet_TSM.py", line 406, in flow_computation match = self.matching_layer(x_pre, x_post) # (BT-1group, HW, HW)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(*input, *kwargs) File "/Documents/MotionSqueeze/resnet_TSM.py", line 164, in forward corr = self.correlation_sampler(feature1, feature2) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl result = self.forward(input,
kwargs) File "/opt/conda/lib/python3.8/site-packages/spatial_correlation_sampler-0.3.0-py3.8-linux-x86_64.egg/spatial_correlation_sampler/spatial_correlation_sampler.py", line 105, in forward return SpatialCorrelationSamplerFunction.apply(input1, input2, self.kernel_size, File "/opt/conda/lib/python3.8/site-packages/spatial_correlation_sampler-0.3.0-py3.8-linux-x86_64.egg/spatial_correlation_sampler/spatial_correlation_sampler.py", line 66, in forward output = correlation.forward(input1, input2, RuntimeError: CUDA out of memory. Tried to allocate 228.00 MiB (GPU 0; 9.78 GiB total capacity; 8.19 GiB already allocated; 34.12 MiB free; 8.52 GiB reserved in total by PyTorch)

rogerhcheng commented 3 years ago

It was my fault, because I was running the train_TSM_Something_v1.sh script as-is. The original value for batch size is 48, when I dropped it down to 36 it started working fine. It seems like my GPU doesn't have quite as much memory as the author's GPU.

TimandXiyu commented 3 years ago

It was my fault, because I was running the train_TSM_Something_v1.sh script as-is. The original value for batch size is 48, when I dropped it down to 36 it started working fine. It seems like my GPU doesn't have quite as much memory as the author's GPU.

Did you ever managed to get this network work as accurate as the paper stated?