PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.19k stars 5.57k forks source link

paddlepaddle-gpu安装正常但是执行示例脚本报错OSError: (External) CUBLAS error(7) #53025

Closed LiemLin closed 1 year ago

LiemLin commented 1 year ago

bug描述 Describe the Bug

环境版本如下 centos 7.9 conda 4.12.0 Python 3.8.16 Cuda compilation tools, release 11.6, V11.6.124 paddlepaddle-gpu 2.4.2.post116

安装测试正常

>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
W0418 15:02:43.914815  6957 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.6, Runtime API Version: 11.6
W0418 15:02:43.928560  6957 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

执行示例脚本报错如下 Traceback (most recent call last): File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddle_pipelines-0.5.2-py3.8.egg/pipelines/pipelines/base.py", line 445, in run node_output, stream_id = self.graph.nodes[node_id]["component"]._dispatch_run(**node_input) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddle_pipelines-0.5.2-py3.8.egg/pipelines/nodes/base.py", line 120, in _dispatch_run return self._dispatch_run_general(self.run, **kwargs) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddle_pipelines-0.5.2-py3.8.egg/pipelines/nodes/base.py", line 164, in _dispatch_run_general output, stream = run_method(**run_inputs, **run_params) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddle_pipelines-0.5.2-py3.8.egg/pipelines/nodes/answer_extractor/answer_extractor.py", line 175, in run synthetic_context_answer_pairs = self.answer_generation_from_paragraphs( File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddle_pipelines-0.5.2-py3.8.egg/pipelines/nodes/answer_extractor/answer_extractor.py", line 144, in answer_generation_from_paragraphs predicts = model(buffer) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddlenlp/taskflow/taskflow.py", line 850, in __call__ results = self.task_instance(inputs) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddlenlp/taskflow/task.py", line 516, in __call__ outputs = self._run_model(inputs) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddlenlp/taskflow/information_extraction.py", line 1066, in _run_model results = self._multi_stage_predict(_inputs) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddlenlp/taskflow/information_extraction.py", line 1164, in _multi_stage_predict result_list = self._single_stage_predict(examples) File "/home/bigdata/miniconda3/envs/qa38/lib/python3.8/site-packages/paddlenlp/taskflow/information_extraction.py", line 977, in _single_stage_predict self.predictor.run() OSError: (External) CUBLAS error(7). [Hint: Please search for the error code(7) on website (https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) to get Nvidia's official solution and advice about CUBLAS Error.] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35) [operator < fc > error]

请看下这是什么问题

其他补充信息 Additional Supplementary Information

No response

winter-wang commented 1 year ago

你好,看报错信息是CUBLAS error(7) 。 image 对应的是CUBLAS_STATUS_INVALID_VALUE。 根据nv官网(https://docs.nvidia.com/cuda/cublas/index.html#cublasstatus_t) 的描述,

An unsupported value or parameter was passed to the function (a negative vector size, for example).To correct: ensure that all the parameters being passed have valid values.

感觉像是你脚本的参数设置的有问题。

LiemLin commented 1 year ago

@winter-wang 多谢解答,把cudnn从8.4升级到8.8可以了