fudan-zvg / SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
MIT License
1.05k stars 150 forks source link

RuntimeError: no valid convolution algorithms available in CuDNN #61

Open wsj20010128 opened 1 year ago

wsj20010128 commented 1 year ago

Traceback (most recent call last): File "./tools/train.py", line 161, in main() File "./tools/train.py", line 157, in main Traceback (most recent call last): File "./tools/train.py", line 161, in meta=meta) File "/workspace/SETR/mmseg/apis/train.py", line 106, in train_segmentor main() File "./tools/train.py", line 157, in main runner.run(data_loaders, cfg.workflow, cfg.total_iters) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run meta=meta) File "/workspace/SETR/mmseg/apis/train.py", line 106, in train_segmentor runner.run(data_loaders, cfg.workflow, cfg.total_iters) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run iter_runner(iter_loaders[i], kwargs) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train iter_runner(iter_loaders[i], kwargs) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train self.call_hook('after_train_iter') File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook self.call_hook('after_train_iter') File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 27, in after_train_iter getattr(hook, fn_name)(self) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 27, in after_train_iter runner.outputs['loss'].backward() File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward runner.outputs['loss'].backward() File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/autograd/init.py", line 127, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/autograd/init.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag allow_unreachable=True) # allow_unreachable flag RuntimeError: no valid convolution algorithms available in CuDNN Exception raised from getValidAlgorithms at /opt/conda/conda-bld/pytorch_1595629403081/work/aten/src/ATen/native/cudnn/Conv.cpp:429 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f3381a4e77d in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0xcb61ea (0x7f3382eb61ea in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #2: + 0xcb64cb (0x7f3382eb64cb in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #3: + 0xcaf3fe (0x7f3382eaf3fe in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #4: + 0xcaa48e (0x7f3382eaa48e in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #5: + 0xcac07b (0x7f3382eac07b in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #6: at::native::cudnn_convolution_backward_input(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0xb2 (0x7f3382eac5d2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #7: + 0xd117db (0x7f3382f117db in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #8: + 0xd415f8 (0x7f3382f415f8 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #9: at::cudnn_convolution_backward_input(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x1ad (0x7f33b5352ced in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #10: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x223 (0x7f3382eaaca3 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #11: + 0xd118c5 (0x7f3382f118c5 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #12: + 0xd41654 (0x7f3382f41654 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #13: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f33b53616a2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #14: + 0x2c250c2 (0x7f33b70250c2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #15: + 0x2c39684 (0x7f33b7039684 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #16: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f33b53616a2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #17: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocator >&&) + 0x258 (0x7f33b6eac098 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #18: + 0x30d1017 (0x7f33b74d1017 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #19: torch::autograd::Engine::evaluate_function(std::shared_ptr&, torch::autograd::Node, torch::autograd::InputBuffer&, std::shared_ptr const&) + 0x1400 (0x7f33b74cc860 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #20: torch::autograd::Engine::thread_main(std::shared_ptr const&) + 0x451 (0x7f33b74cd401 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #21: torch::autograd::Engine::thread_init(int, std::shared_ptr const&, bool) + 0x89 (0x7f33b74c5579 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #22: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr const&, bool) + 0x4a (0x7f33bbd2a99a in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #23: + 0xdbbf4 (0x7f33beac7bf4 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6) frame #24: + 0x94b43 (0x7f33e0fcdb43 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #25: clone + 0x44 (0x7f33e105ebb4 in /usr/lib/x86_64-linux-gnu/libc.so.6) RuntimeError : no valid convolution algorithms available in CuDNN Exception raised from getValidAlgorithms at /opt/conda/conda-bld/pytorch_1595629403081/work/aten/src/ATen/native/cudnn/Conv.cpp:429 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7fc33604e77d in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: + 0xcb61ea (0x7fc3374b61ea in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #2: + 0xcb64cb (0x7fc3374b64cb in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #3: + 0xcaf3fe (0x7fc3374af3fe in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #4: + 0xcaa48e (0x7fc3374aa48e in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #5: + 0xcac07b (0x7fc3374ac07b in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #6: at::native::cudnn_convolution_backward_input(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0xb2 (0x7fc3374ac5d2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #7: + 0xd117db (0x7fc3375117db in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #8: + 0xd415f8 (0x7fc3375415f8 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #9: at::cudnn_convolution_backward_input(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x1ad (0x7fc369952ced in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #10: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x223 (0x7fc3374aaca3 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #11: + 0xd118c5 (0x7fc3375118c5 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #12: + 0xd41654 (0x7fc337541654 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so) frame #13: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7fc3699616a2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #14: + 0x2c250c2 (0x7fc36b6250c2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #15: + 0x2c39684 (0x7fc36b639684 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #16: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7fc3699616a2 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #17: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocator >&&) + 0x258 (0x7fc36b4ac098 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #18: + 0x30d1017 (0x7fc36bad1017 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #19: torch::autograd::Engine::evaluate_function(std::shared_ptr&, torch::autograd::Node, torch::autograd::InputBuffer&, std::shared_ptr const&) + 0x1400 (0x7fc36bacc860 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #20: torch::autograd::Engine::thread_main(std::shared_ptr const&) + 0x451 (0x7fc36bacd401 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #21: torch::autograd::Engine::thread_init(int, std::shared_ptr const&, bool) + 0x89 (0x7fc36bac5579 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) frame #22: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr const&, bool) + 0x4a (0x7fc37032a99a in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #23: + 0xdbbf4 (0x7fc3730c7bf4 in /workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6) frame #24: + 0x94b43 (0x7fc395739b43 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #25: clone + 0x44 (0x7fc3957cabb4 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Traceback (most recent call last): File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in main() File "/workspace/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/miniconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/SETR/SETR_Naive_256x256.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

PyTorch Version: 1.6.0 CUDA Available CUDA Version: 10.1 cuDNN Version: 7603 Can somebody help me with this error? Thank you!