PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.92k stars 5.51k forks source link

RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion. [Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60) #50195

Closed lichuantao6626 closed 1 year ago

lichuantao6626 commented 1 year ago

请提出你的问题 Please ask your question

在使用https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers/examples/textual_inversion进行模型微调试时包了一个错: W0203 07:54:14.771606 13098 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.8, Runtime API Version: 11.7 W0203 07:54:14.771862 13098 dynamic_loader.cc:307] The third-party dynamic library (libcudnn.so) that Paddle depends on is not configured correctly. (error code is /usr/local/cuda/lib64/libcudnn.so: cannot open shared object file: No such file or directory) Suggestions:

  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
    • Linux: set LD_LIBRARY_PATH by export LD_LIBRARY_PATH=...
    • Windows: set PATH by `set PATH=XXX; Traceback (most recent call last): File "train_textual_inversion.py", line 672, in main() File "train_textual_inversion.py", line 454, in main text_encoder = model_cls.from_pretrained(os.path.join(args.pretrained_model_name_or_path, "text_encoder")) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddlenlp/transformers/model_utils.py", line 477, in from_pretrained return cls.from_pretrained_v2(pretrained_model_name_or_path, from_hf_hub=from_hf_hub, *args, kwargs) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddlenlp/transformers/clip/modeling.py", line 521, in from_pretrained_v2 model = cls(config, *init_args, *model_kwargs) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddlenlp/transformers/utils.py", line 170, in impl init_func(self, args, kwargs) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddlenlp/transformers/clip/modeling.py", line 723, in init self.text_model = CLIPTextTransformer(config) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddlenlp/transformers/clip/modeling.py", line 591, in init self.token_embedding = nn.Embedding(config.vocab_size, embed_dim) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddle/nn/layer/common.py", line 1505, in init self.weight = self.create_parameter( File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 446, in create_parameter return self._helper.create_parameter(temp_attr, shape, dtype, is_bias, File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/layer_helper_base.py", line 374, in create_parameter return self.main_program.global_block().create_parameter( File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/framework.py", line 3965, in create_parameter initializer(param, self) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/initializer.py", line 56, in call return self.forward(param, block) File "/opt/data/anaconda3_0203/envs/paddle_env/lib/python3.8/site-packages/paddle/fluid/initializer.py", line 614, in forward out_var = _C_ops.uniform_random(out_var.shape, out_dtype, RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion. [Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)

paddle安装的是2.4.1,验证安装时成功的,paddlenlp安装的是2.5.0 请问,各位技术大佬这个问题怎么解决。

paddle-bot[bot] commented 1 year ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

lichuantao6626 commented 1 year ago

问题已经解决了,解决的方式是: export LD_LIBRARY_PATH='/opt/data/anaconda3_0203/envs/paddle_env/lib' 执行了这个命令,在执行训练任务就没有上面这个错误提示了,具体原因我也不太清楚。

ZHUI commented 1 year ago

嗯嗯,可能 run_check 里面 没有依赖 cudnn 的 算子。这里主要是 cudnn 路径不对

Vimos commented 3 weeks ago

我遇到的问题比上面又多了一层,即使加了LD_LIBRARY_PATH


RuntimeError: (PreconditionNotMet) The third-party dynamic library (libcublas.so) that Paddle depends on is not configured correctly. (error code is libcublas.so: cannot open shared object file: No such file or directory)
  Suggestions:
  1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
  2. Configure third-party dynamic library environment variables as follows:
  - Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
  - Windows: set PATH by `set PATH=XXX; (at ../paddle/phi/backends/dynload/dynamic_loader.cc:311)
  [operator < fc > error]

不得已还得增加软链

ln -s libcublas.so.12 libcublas.so

但是,我觉得这种操作很迷惑,pytorch那边怎么解决的,为何每次paddle就要配置这些才能运行呢?