PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.93k stars 2.91k forks source link

OSError: (External) CUBLAS error(7) #2253

Closed aixuedegege closed 2 years ago

aixuedegege commented 2 years ago

欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:

paddle-bfloat 0.1.2 paddle2onnx 0.9.6 paddlefsl 1.1.0 paddlenlp 2.3.0 paddlepaddle-gpu 2.3.0.post112

2)系统环境:请您描述系统类型,例如Linux/Windows/MacOS/,python版本

schema = ['被申请人', '公司', '持有比例', '价值','人名'] # Define the schema for entity extraction
ie = Taskflow('information_extraction', schema=schema)

报错信息: OSError: (External) CUBLAS error(7). [Hint: 'CUBLAS_STATUS_INVALID_VALUE'. An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at /paddle/paddle/phi/backends/gpu/gpu_context.cc:424) [operator < truncated_gaussian_random > error]

linjieccc commented 2 years ago

@aixuedegege 辛苦提供下cuda版本和gpu型号,可能是paddlepaddle gpu版本没装好。

可以参考官网的方式验证下是否安装成功:

import paddle
paddle.utils.run_check()

完整安装文档说明:https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html

aixuedegege commented 2 years ago

果然报错了,

nvcc -V
#nvcc: NVIDIA (R) Cuda compiler driver
#Copyright (c) 2005-2021 NVIDIA Corporation
#Built on Sun_Mar_21_19:15:46_PDT_2021
#Cuda compilation tools, release 11.3, V11.3.58
#Build cuda_11.3.r11.3/compiler.29745058_0

GPU是P100

运行上上面两条命令报错:

import paddle
 paddle.utils.run_check()

Running verify PaddlePaddle program ... W0520 10:55:58.013888 29992 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.6, Runtime API Version: 11.2 W0520 10:55:58.137661 29992 gpu_context.cc:306] device: 0, cuDNN Version: 8.2. Traceback (most recent call last): File "", line 1, in File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/paddle/utils/install_check.py", line 266, in run_check _run_static_single(use_cuda, use_xpu, use_npu) File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/paddle/utils/install_check.py", line 170, in _run_static_single exe.run(startup_prog) File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1299, in run six.reraise(*sys.exc_info()) File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1285, in run res = self._run_impl( File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1510, in _run_impl return self._run_program( File "/home/p100/anaconda3/envs/ici/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1607, in _run_program self._default_executor.run(program.desc, scope, 0, True, True, OSError: (External) CUBLAS error(7). [Hint: 'CUBLAS_STATUS_INVALID_VALUE'. An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at /paddle/paddle/phi/backends/gpu/gpu_context.cc:424)

使用smi查看信息:

 nvidia-smi

Fri May 20 10:58:27 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 | | N/A 54C P0 33W / 250W | 627MiB / 16384MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 29992 C python 625MiB | +-----------------------------------------------------------------------------+

paddle安装命令:

python -m pip install paddlepaddle-gpu==2.3.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

需要cuda的版本和gpu驱动支持的cuda版本还有paddle支持的cuda版本完全一致么?

wawltor commented 2 years ago

建议使用conda的方式来安装,不然很容易和环境有冲突 https://www.paddlepaddle.org.cn/

image
aixuedegege commented 2 years ago

感谢回复,驱动啥的没有换,我直接用的paddle的gpu docker 容器,可以跑了。这个最简单了!

sportzhang commented 1 year ago

遇到了相似的问题: 版本、环境信息 1)PaddleNLP和PaddlePaddle版本:请提供您的PaddleNLP和PaddlePaddle版本号 paddlenlp 2.5.1 paddlepaddle-gpu 2.4.1.post112

2)系统环境:Linux(Ubuntu16.04),python版本3.8.12

执行代码: senta = Taskflow("sentiment_analysis", model="skep_ernie_1.0_large_ch", task_path="/home/user/machine_learning/machine_learning_out/sentiment_analysis/skep/checkpoints/model_1700") sentiment_list = senta("摆摊赚钱了要交税吗") 报错信息: OSError: (External) CUBLAS error(7). [Hint: Please search for the error code(7) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/backends/gpu/gpu_context.cc:593) [operator < multihead_matmul > error]

mattaltberg commented 1 year ago

Running into a similar issue when using the triton python backend:

paddlepaddle-gpu 2.4.2post112 paddleocr 2.6.1.3

I get CUBLAS error(7) when using paddle.utils.run_check()

loxs123 commented 1 month ago

运行下面两句没问题,但是还是出现了 CUBLAS error(7) 这个错误改如何解决?

import paddle
paddle.utils.run_check()
image
qinhuangdaoStation commented 1 month ago

1、从经验来看,以下配置可以解决这个问题:(以下组合为个人尝试之后的解决方式,也可以自行探索其他可行的版本组合)(安装的时候请注意以下顺序) (1)python=3.9(这个很重要,3.11和3.8都有问题) (2)cudatoolkit=11.7(这个也很重要,cudatoolkit=其他版本都有问题) (3)paddlenlp=2.8.1这个是我完成上述步骤之后再安装的 2、然后,直接使用官方安装方式即可,其他依赖包再后续安装 conda install paddlepaddle-gpu==2.6.1 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge 3、cudnn我没有单独安装,据说安装cudatoolkit的时候会包含