PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.07k stars 5.54k forks source link

Paddle使用GPU训练时,报错CUBLAS error(1). #54291

Open motianxiuhua opened 1 year ago

motianxiuhua commented 1 year ago

请提出你的问题 Please ask your question

paddle开发者,你好,我的代码在CPU上能训练,但是当我配置好GPU环境时,代码却无法训练,出现以下问题:

(External) CUBLAS error(1). 
  [Hint: Please search for the error code(1) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/backends/gpu/gpu_resources.cc:156)
  [operator < linear > error]
  File "/home/user001/Code/Paddle/Layers.py", line 112, in forward
    out = self.linear1(x)
  File "/home/user001/Code/Paddle/Model_copy2.py", line 173, in forward
    pred = self.pred(out)
  File "/home/user001/Code/Paddle/Model_copy2.py", line 201, in <module>
    predict=solarpower_model(feature)
OSError: (External) CUBLAS error(1). 
  [Hint: Please search for the error code(1) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/backends/gpu/gpu_resources.cc:156)
  [operator < linear > error]

上述问题,我查阅相关资料说是cuda版本不对应导致的。 但是我的CUDA版本和Paddle版本是对应,按照官方给出的教程安装的。使用的是以下指令: conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge

下面是我安装的pip库:

cudatoolkit               11.2.2
cudnn                     8.2.1.32
paddlepaddle-gpu   2.4.2.post112

我使用paddle.utils.run_check()得到以下信息提示,说明我安装正确,我很疑惑,希望开发者能够给出一点建议,非常感谢。

>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
W0601 22:36:29.441434 2351469 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2
W0601 22:36:29.470683 2351469 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
PaddlePaddle works well on 1 GPU.
/home/user001/miniconda3/envs/paddle_gpu/lib/python3.9/site-packages/paddle/fluid/executor.py:1583: UserWarning: Standalone executor is not used for data parallel
  warnings.warn(
W0601 22:36:33.071208 2351469 parallel_executor.cc:666] Cannot enable P2P access from 0 to 1
W0601 22:36:33.071252 2351469 parallel_executor.cc:666] Cannot enable P2P access from 1 to 0
W0601 22:36:34.020864 2351469 fuse_all_reduce_op_pass.cc:79] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 2 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
ZhangHandi commented 1 year ago

系统的cuda也是11.2么?如果版本不一致,可能没有优先使用cudatoolkit导致版本不对应

motianxiuhua commented 1 year ago

系统的cuda也是11.2,另外该怎么设置优先使用cudatoolkit呢,我看 paddle.utils.run_check()显示的也是11.2

>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
W0601 22:36:29.441434 2351469 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2
W0601 22:36:29.470683 2351469 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
LuckyLee333 commented 5 months ago

我也遇到了这样的问题,你解决了吗?怎么解决的?