ExternalError: CUBLAS error(15)

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

http://www.paddlepaddle.org/

Apache License 2.0

22.24k stars 5.58k forks source link

ExternalError: CUBLAS error(15) #49519

Open EWAN9709 opened 1 year ago

EWAN9709 commented 1 year ago

请问进行推理时报错，ExternalError: CUBLAS error(15). [Hint: Please search for the error code(15) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35)

paddle-bot[bot] commented 1 year ago

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

wuyefeilin commented 1 year ago

检查一下cuda版本和paddle版本是否对应呢

impactcolor commented 1 year ago

Hi I have this same error on a Colab notebook. The version of cuda is : 11.2 verified with: ! nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0

Paddle version I'm installing like so: !pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

The error is: OSError: (External) CUBLAS error(15). [Hint: 'CUBLAS_STATUS_NOT_SUPPORTED'. The functionality requested is not supported ] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35) [operator < matmul_v2 > error]

abhibeats95 commented 1 year ago

Anyone able to solve this error,

lynxhawk commented 1 year ago

OSError: (External) CUBLAS error(15). [Hint: Please search for the error code(15) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35) 我也有同样的问题，而且查不到这个error(15)是什么意思

kingmpw2015 commented 1 year ago

我也遇到这个问题，现在有解决方案吗

keetsky commented 1 year ago

解决了么

lynxhawk commented 1 year ago

官方没回啊我用这个一键安装装完环境倒是可以正常使用 https://github.com/PFCCLab/fool-proof-paddle-installation

longxiao11 commented 1 year ago

有解决方法了吗

wuyefeilin commented 1 year ago

可以尝试用官网的docker方法进行安装。

xuxiansheng2018 commented 1 year ago

解决了吗？

biandh commented 1 year ago

可以试试下面的设置：

设置下$LD_LIBRARY_PATH，保证cuda的lib64在你的LD_LIBRARY_PATH里面，比如：

export LD_LIBRARY_PATH=/xxx/cuda-11.2/lib64:$LD_LIBRARY_PATH
设置下NCCL的lib，保证也在$LD_LIBRARY_PATH里面，比如:

export LD_LIBRARY_PATH=/xxx/nccl/build/lib:$LD_LIBRARY_PATH
设置下$CUDA_HOME，比如:

export CUDA_HOME=/xxx/cuda-11.2:$CUDA_HOME

twwch commented 1 year ago

ExternalError: CUBLAS error(15). [Hint: Please search for the error code(15) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:62)

好多坑啊，前两天好好的，突然就不行了

yuwode commented 1 year ago

同问

xuxiansheng2018 commented 1 year ago

请问有人解决了吗，如何解决？ OSError: (External) CUBLAS error(15). [Hint: Please search for the error code(15) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35) [operator < linear > error] I0625 15:15:57.797241 708944 tcp_store.cc:257] receive shutdown event and so quit from MasterDaemon run loop LAUNCH INFO 2023-06-25 15:15:59,686 Exit code 1

biandh commented 1 year ago

请问有人解决了吗，如何解决？ OSError: (External) CUBLAS error(15). [Hint: Please search for the error code(15) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35) [operator < linear > error] I0625 15:15:57.797241 708944 tcp_store.cc:257] receive shutdown event and so quit from MasterDaemon run loop LAUNCH INFO 2023-06-25 15:15:59,686 Exit code 1

我这边是设置 export LD_LIBRARY_PATH=/xxx/cuda-xxx/lib64:$LD_LIBRARY_PATH 就好了

jjrCN commented 1 year ago

还是不行，有人管吗

tuobay commented 1 year ago

可以试着把你这个conda环境下的对应的cuda版本的lib加到LD_LIBRARY_PATH最前面，比如export LD_LIBRARY_PATH=/home/xxx/conda/pkgs/cudatoolkit-11.2.2-hbe64b41_10/lib:$LD_LIBRARY_PATH，这样可以让paddle找到他对应的11.2的cuda的lib库

fishisnow commented 11 months ago

我也遇到这个问题，费了老大功夫才定位到，是因为 torch 版本与 cuda 不兼容的原因，奇怪的是这个问题我在物理机上是没问题的，但是部署到容器上就有问题

risemeup1 commented 11 months ago

/paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h

nvidia-smi现实的cuda版本呢

NFTY19 commented 1 week ago

这个问题，我在同时安装了pytorch和paddlepaddle的时候遇到了。我当时装的是 pytorch2.0.1 + paddlepaddle2.6.1 ，但是这个情况在cuda11版本的情况下就会爆出这个错误。但是在cuda12版本的情况下又是正常的。为了兼容cuda11，我尝试了将pytorch的版本更换到1.11.0，成功解决的。已经通过测试的版本： cuda11.4 + pytorch1.11.0 + paddlepaddle2.6.1 cuda12.2 + pytorch2.0.1 + paddlepaddle2.6.1