clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.53k stars 444 forks source link

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedExFix #214

Open bhhsieh-icrr opened 1 year ago

bhhsieh-icrr commented 1 year ago

Hi, I ran a script a month ago and it was fine. But when I run it today, it shows the following error message.

Traceback (most recent call last): File "test.py", line 98, in predictions = test(args) File "test.py", line 49, in test output = pretrainedmodel.inference(image=sample["image"], prompt=f"<s{args.task_name}>")["predictions"][0] File "/home/bhhsieh/donut/donut/model.py", line 455, in inference last_hidden_state = self.encoder(image_tensors) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/bhhsieh/donut/donut/model.py", line 101, in forward x = self.model.layers(x) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 420, in forward x = self.blocks(x) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 310, in forward attn_windows = self.attn(x_windows, mask=self.attn_mask) # num_winB, window_sizewindow_size, C File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/bhhsieh/miniconda3/envs/donutv3/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 203, in forward attn = (q @ k.transpose(-2, -1)) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

The script is exactly same as before. Is this error related to recent updates in key dependency libraries? How could I solve this error?

bhhsieh-icrr commented 1 year ago

I solved it by simply pip uninstall nvidia-cublas-cu11 not sure whether it is the correct way

Reference: https://discuss.pytorch.org/t/runtimeerror-cuda-error-cublas-status-invalid-value-when-calling-cublassgemm-handle-opa-opb-m-n-k-alpha-a-lda-b-ldb-beta-c-ldc/124544/22