Mukosame / Zooming-Slow-Mo-CVPR-2020

Fast and Accurate One-Stage Space-Time Video Super-Resolution (accepted in CVPR 2020)
GNU General Public License v3.0
908 stars 165 forks source link

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` #44

Closed ThompsonHe closed 3 years ago

ThompsonHe commented 3 years ago

Sorry for bothering. I have run the make.sh and it finished successfully. But when i run the test.py , something went wrong. Here is the output imformation:

Traceback (most recent call last): File "test.py", line 255, in example_dconv() File "test.py", line 179, in example_dconv error.backward() File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply return self._forward_cls.backward(self, args) File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 189, in wrapper outputs = fn(ctx, args) File "/data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/dcn_v2.py", line 44, in backward ctx.deformable_groups) RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) (createCublasHandle at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fac36e16627 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0x4173335 (0x7fac3ccb4335 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #2: at::cuda::getCurrentCUDABlasHandle() + 0x458 (0x7fac3ccb4c18 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #3: + 0x416b092 (0x7fac3ccac092 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #4: THCudaBlas_Sgemm + 0x7e (0x7fac3d0b9a3e in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #5: dcn_v2_cuda_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0xe94 (0x7fac1b20e141 in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so) frame #6: dcn_v2_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x9b (0x7fac1b1f987b in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so) frame #7: + 0x3f1f1 (0x7fac1b2071f1 in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so) frame #8: + 0x3f82e (0x7fac1b20782e in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so) frame #9: + 0x3af0e (0x7fac1b202f0e in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)

frame #22: torch::autograd::PyNode::apply(std::vector >&&) + 0x178 (0x7fac68f94468 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #23: + 0x3bd3fb6 (0x7fac3c714fb6 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #24: torch::autograd::Engine::evaluate_function(std::shared_ptr&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x1373 (0x7fac3c711413 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #25: torch::autograd::Engine::thread_main(std::shared_ptr const&, bool) + 0x4b2 (0x7fac3c712042 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #26: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fac3c70b939 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so) frame #27: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fac68f8afaa in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #28: + 0xc819d (0x7fac6887719d in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/../../../libstdc++.so.6) frame #29: + 0x76ba (0x7fac786076ba in /lib/x86_64-linux-gnu/libpthread.so.0) frame #30: clone + 0x6d (0x7fac7833d4dd in /lib/x86_64-linux-gnu/libc.so.6) Segmentation fault (core dumped) **I don't know how to fix it. Could you please help and give me some ideas? Thank you!**