RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedExFix #70

Open thistleknot opened 1 year ago

thistleknot commented 1 year ago

after pip install nougat-ocr

tried to run

nougat file.pdf -o .

downloading nougat checkpoint version 0.1.0-small to path /root/.cache/torch/hub/nougat
config.json: 100%|█████████████████████████████████████████████████████████████████████| 557/557 [00:00<00:00, 2.26Mb/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████| 956M/956M [01:40<00:00, 9.95Mb/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████| 96.0/96.0 [00:00<00:00, 464kb/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████| 2.04M/2.04M [00:00<00:00, 14.0Mb/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████| 106/106 [00:00<00:00, 609kb/s]
/home/user/env-10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                           | 0/631 [00:00<?, ?it/s][W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.
  0%|                                                                                           | 0/631 [05:59<?, ?it/s]
Traceback (most recent call last):
  File "/home/user/env-10/bin/nougat", line 8, in <module>
  File "/home/user/env-10/lib/python3.10/site-packages/predict.py", line 130, in main
    model_output = model.inference(image_tensors=sample)
  File "/home/user/env-10/lib/python3.10/site-packages/nougat/model.py", line 577, in inference
    last_hidden_state = self.encoder(image_tensors)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/nougat/model.py", line 123, in forward
    x = self.model.layers(x)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/timm/models/swin_transformer.py", line 413, in forward
    x = blk(x)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/timm/models/swin_transformer.py", line 295, in forward
    attn_windows = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/timm/models/swin_transformer.py", line 183, in forward
    attn = (q @ k.transpose(-2, -1))
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
(env-10) root@m4700:/mnt/h/nougat# nvidia wat^C
(env-10) root@m4700:/mnt/h/nougat# watch -c nvidia-smi
(env-10) root@m4700:/mnt/h/nougat#

I don't see a requirements... I'm on python 3.10, using ubuntu 22 in wsl. I am able to run nvidia-smi

thistleknot commented 1 year ago

tried installing using .git just now, and same error

/home/user/env-10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                           | 0/631 [00:00<?, ?it/s][W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.
  0%|                                                                                           | 0/631 [05:40<?, ?it/s]
Traceback (most recent call last):
  File "/home/user/env-10/bin/nougat", line 8, in <module>
  File "/home/user/env-10/lib/python3.10/site-packages/predict.py", line 130, in main
    model_output = model.inference(image_tensors=sample)
  File "/home/user/env-10/lib/python3.10/site-packages/nougat/model.py", line 577, in inference
    last_hidden_state = self.encoder(image_tensors)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/nougat/model.py", line 123, in forward
    x = self.model.layers(x)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/timm/models/swin_transformer.py", line 413, in forward
    x = blk(x)
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/timm/models/swin_transformer.py", line 295, in forward
    attn_windows = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C
  File "/home/user/env-10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/env-10/lib/python3.10/site-packages/timm/models/swin_transformer.py", line 183, in forward
    attn = (q @ k.transpose(-2, -1))
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
lukas-blecher commented 1 year ago

What torch version is that? And what CUDA version?

thistleknot commented 1 year ago

ubuntu 22 (wsl) nvidia-smi shows cuda 11.6 torch 2.0.1+cu117 installed cuda is 11.7

lukas-blecher commented 1 year ago

I was not able to reproduce this. I have nvidia-cublas-cu11== See my full env below

thistleknot commented 1 year ago

https://huggingface.co/databricks/dolly-v2-12b/discussions/21 ' I think this can also arise as an "out of memory" error. Please, it's more helpful if people say how they are running this, and whether you've ruled out what is in previous comments!' I have 4GB of Vram. Maybe I should try a smaller document =D

I was not able to reproduce this. I have nvidia-cublas-cu11==

thistleknot commented 1 year ago

that wasn't it... it throws that error immediately with a smaller document I'm going to try docker

lukas-blecher commented 1 year ago

maybe try batch size 1 nougat -b 1 file.pdf

thistleknot commented 1 year ago

same. I'll try in docker, and let you know. I have a rocky linux 9 machine, but it's training a model atm. So I only have these 4GB to play with. However, I have lxc and docker, both with gpu pass through, so I should be able to test this out in a container, and 95% of the time (so far 100% with cuda) I've been able to test in a container (for example, I've had woes with gpt4all in arch)

willocho commented 5 months ago

Were you able to solve this issue? I'm running into the same issue and I think it might be a mismatch between the CUDA version installed on my machine and the one used to build pytorch.