Enormous CUDA memory allocation

KiriLev commented 5 years ago

Hi! I have such problem with high memory allocation. `RuntimeError: CUDA out of memory. Tried to allocate 15804270484.47 GiB (GPU 0; 7.93 GiB total capacity; 2.43 GiB already allocated; 3.41 GiB free; 28.49 MiB cached) (malloc at /pytorch/aten/src/THC/THCCachingAllocator.cpp:231) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f27c658c021 in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f27c658b8ea in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: + 0x13f8255 (0x7f27d1b12255 in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #3: + 0x13f900a (0x7f27d1b1300a in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #4: THCudaMalloc + 0x46 (0x7f27d1b191b6 in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #5: gpu_ctc(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, at::Tensor, int) + 0x196 (0x7f27c243b046 in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6-linux-x86_64.egg/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so) frame #6: + 0x1130b (0x7f27c244630b in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6-linux-x86_64.egg/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so) frame #7: + 0xe4b7 (0x7f27c24434b7 in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6-linux-x86_64.egg/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so) frame #8: python() [0x511b75]

frame #10: python() [0x4f3338] frame #11: python() [0x586e6d] frame #13: THPFunction_apply(_object*, _object*) + 0x581 (0x7f2806c32ab1 in /home/kirlev/Projects/Python/venv/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #14: python() [0x5117df] frame #17: python() [0x5917a1] frame #20: python() [0x4f3338] frame #22: python() [0x5917a1] frame #24: python() [0x575a31] frame #26: python() [0x511b0a] frame #28: python() [0x510c78] frame #29: python() [0x5119bd] frame #31: python() [0x4f3338] frame #33: python() [0x640862] frame #38: __libc_start_main + 0xe7 (0x7f282c848b97 in /lib/x86_64-linux-gnu/libc.so.6) ` Compiled with gcc-7, cuda 10

ronybanerjee93 commented 5 years ago

I have a similar probelm, I have initialized the loss function inside the init of a class, while making the object it throws me the floowing error. The traceback starts from here..... self.ctcloss = self.criterion(self.logits, self.targets, self.output_sizes, self.target_sizes).to(device) File "/home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/warpctc_pytorch-0.1-py3.7-linux-x86_64.egg/warpctc_pytorch/init.py", line 82, in forward self.length_average, self.blank) File "/home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/warpctc_pytorch-0.1-py3.7-linux-x86_64.egg/warpctc_pytorch/init.py", line 32, in forward blank) RuntimeError: CUDA out of memory. Tried to allocate 12000684159.32 GiB (GPU 0; 7.76 GiB total capacity; 182.14 MiB already allocated; 4.74 GiB free; 9.86 MiB cached) (malloc at /opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/THC/THCCachingAllocator.cpp:231) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fc0024bacf5 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0x1239bc1 (0x7fc00679cbc1 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so) frame #2: + 0x123a53a (0x7fc00679d53a in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so) frame #3: THCudaMalloc + 0x46 (0x7fc0067a6086 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so) frame #4: gpu_ctc(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, at::Tensor, int) + 0x196 (0x7fbfc958f1e6 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/warpctc_pytorch-0.1-py3.7-linux-x86_64.egg/warpctc_pytorch/_warp_ctc.cpython-37m-x86_64-linux-gnu.so) frame #5: + 0x12a22 (0x7fbfc959aa22 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/warpctc_pytorch-0.1-py3.7-linux-x86_64.egg/warpctc_pytorch/_warp_ctc.cpython-37m-x86_64-linux-gnu.so) frame #6: + 0x10075 (0x7fbfc9598075 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/warpctc_pytorch-0.1-py3.7-linux-x86_64.egg/warpctc_pytorch/_warp_ctc.cpython-37m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object*, _object*) + 0x5a1 (0x7fc0299ac061 in /home/cvpr/miniconda2/envs/project/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #40: __libc_start_main + 0xe7 (0x7fc03b77eb97 in /lib/x86_64-linux-gnu/libc.so.6)

ronybanerjee93 commented 5 years ago

@SeanNaren Can you suggest anything about this??

XiongChengxin commented 5 years ago

Similar error message. Waiting for solution.

XiongChengxin commented 5 years ago

Solved. Insuring the type of 'probs' is torch.FloatTensor, and the other parameters are torch.IntTensor.

SeanNaren commented 5 years ago

As @XiongChengxin said please check the input types! Also ensure that you've got the correct input shape before calling the ctc function

SeanNaren / warp-ctc

Enormous CUDA memory allocation #118