PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.38k stars 2.84k forks source link

RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion. [Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60) #7629

Open yamashin0922 opened 1 year ago

yamashin0922 commented 1 year ago

问题确认 Search before asking

Bug组件 Bug Component

Installation

Bug描述 Describe the Bug

The error message is shown when I try an inference with gpu using the configuration file in examples.

Here is all logs.

root@xxx:/PaddleDetection# python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_mot.yml --video_file=test.mp4 --device=gpu
/PaddleDetection/deploy/pipeline/pipeline.py:24: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
  from collections import Sequence, defaultdict
-----------  Running Arguments -----------
MOT:
  batch_size: 1
  enable: true
  model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
  tracker_config: deploy/pipeline/config/tracker_config.yml
crop_thresh: 0.5
visual: true
warmup_frame: 50

------------------------------------------
Multi-Object Tracking enabled
100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 186349/186349 [02:09<00:00, 1441.91KB/s]
MOT  model dir:  /root/.cache/paddle/infer_weights/mot_ppyoloe_l_36e_pipeline
-----------  Model Configuration -----------
Model Arch: YOLO
Transform Order:
--transform op: Resize
--transform op: Permute
--------------------------------------------
video fps: 30, frame_count: 19854
Thread: 0; frame id: 0
Traceback (most recent call last):
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 1103, in <module>
    main()
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 1090, in main
    pipeline.run_multithreads()
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 170, in run_multithreads
    self.predictor.run(self.input)
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 488, in run
    self.predict_video(input, thread_idx=thread_idx)
  File "/PaddleDetection/deploy/pipeline/pipeline.py", line 668, in predict_video
    res = self.mot_predictor.predict_image(
  File "/PaddleDetection/deploy/pptracking/python/mot_sde_infer.py", line 478, in predict_image
    inputs = self.preprocess(batch_image_list)
  File "/PaddleDetection/deploy/pptracking/python/det_infer.py", line 140, in preprocess
    input_tensor.copy_from_cpu(inputs[input_names[i]])
  File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/paddle/fluid/inference/wrapper.py", line 38, in tensor_copy_from_cpu
    self.copy_from_cpu_bind(data)
RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
  [Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)

Could you help me to fix this?

It works when remove "--device=gpu" from command line.

复现环境 Environment

OS

root@xxx:/PaddleDetection# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

GPU and drivers

root@xxx:/PaddleDetection# nvidia-smi
Wed Jan 18 06:20:36 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  Off  | 00000000:00:10.0 Off |                    0 |
| N/A   29C    P0    43W / 300W |      4MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nvcc

root@xxx:/PaddleDetection# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

python

root@xxx:/PaddleDetection# python -V
Python 3.9.16

packages for python

root@xxx:/PaddleDetection# pip list
Package            Version
------------------ -------------
aiofiles           22.1.0
aiohttp            3.8.3
aiosignal          1.3.1
altair             4.2.0
anyio              3.6.2
astor              0.8.1
async-timeout      4.0.2
attrdict           2.0.1
attrs              22.2.0
Babel              2.11.0
bce-python-sdk     0.8.74
Brotli             1.0.9
certifi            2022.12.7
charset-normalizer 2.1.1
click              8.1.3
contourpy          1.0.7
cycler             0.11.0
Cython             0.29.33
decorator          5.1.1
dill               0.3.6
entrypoints        0.4
fastapi            0.89.1
ffmpy              0.3.0
filterpy           1.4.5
Flask              2.2.2
flask-babel        3.0.0
fonttools          4.38.0
frozenlist         1.3.3
fsspec             2022.11.0
future             0.18.3
gevent             22.10.2
geventhttpclient   2.0.8
gradio             3.16.2
greenlet           2.0.1
grpcio             1.41.0
h11                0.14.0
httpcore           0.16.3
httpx              0.23.3
idna               3.4
importlib-metadata 6.0.0
itsdangerous       2.1.2
Jinja2             3.1.2
joblib             1.2.0
jsonschema         4.17.3
kiwisolver         1.4.4
lap                0.4.0
linkify-it-py      1.0.3
markdown-it-py     2.1.0
MarkupSafe         2.1.2
matplotlib         3.6.3
mdit-py-plugins    0.3.3
mdurl              0.1.2
motmetrics         1.4.0
mpmath             1.2.1
multidict          6.0.4
multiprocess       0.70.14
numpy              1.24.1
onnx               1.12.0
opencv-python      4.5.5.64
opt-einsum         3.3.0
orjson             3.8.5
packaging          23.0
paddle-bfloat      0.1.7
paddlepaddle-gpu   2.4.1.post117
pandas             1.5.2
Pillow             9.4.0
pip                22.3.1
protobuf           3.20.0
psutil             5.9.4
pyclipper          1.3.0.post4
pycocotools        2.0.6
pycryptodome       3.16.0
pydantic           1.10.4
pydub              0.25.1
pyparsing          3.0.9
pyrsistent         0.19.3
python-dateutil    2.8.2
python-multipart   0.0.5
python-rapidjson   1.9
pytz               2022.7.1
PyYAML             6.0
rarfile            4.0
requests           2.28.2
rfc3986            1.5.0

Bug描述确认 Bug description confirmation

是否愿意提交PR? Are you willing to submit a PR?

wangxinxin08 commented 1 year ago

According to the log message, this error may be caused by that the cudnn path is not set correctly or cudnn is not installed in your system

yamashin0922 commented 1 year ago

Thank you for your prompt reply. I'll try to install cudnn property version and will feedback to you.

mahachaaben99 commented 1 year ago

hello did you solve the problem

mahachaaben99 commented 1 year ago

i am having the same issue

Gokulnath-V commented 1 year ago

The first step is to check whether there are libcudnn.so and libcublas.so in the shared library. Enter below command in the terminal.

ls /usr/lib | grep lib

If you don't have libcudnn.so and libcublas.so files, you need to find their location by below command.

locate libcudnn.so locate libcublas.so

In my case, libcudnn.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 And libcublas.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1

Once you locate them, you need to add them into the shared library by following the steps below.

Enter usr/lib folder cd /usr/lib

Create libcudnn.so and libcublas.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 libcudnn.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1 libcublas.so

Now, check whether they are added to the shared library, ls /usr/lib | grep lib

If you can find libcudnn.so and libcublas.so with the above command, you wouldn't be having the issue.

bamboosteam commented 1 year ago

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

sjtugzx commented 12 months ago

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

ZSitong commented 11 months ago

same error

Zhouziyuya commented 11 months ago

Same error. Do you have any solution about this? Thanks.

be42day commented 10 months ago

Use Cuda 11 instead of 12

We only release paddlepaddle-gpu cuda10.2 on pypi. If you want to install paddlepaddle-gpu with cuda version of 10.2/11.2/11.6/11.7, commands to install are on our website

https://pypi.org/project/paddlepaddle-gpu/

s4lm-xi commented 7 months ago

Updated link to install paddlepaddle-gpu for CUDA 10.2/11.2/11.6/11.7.

Installation documentation

ye7love7 commented 3 months ago

The first step is to check whether there are libcudnn.so and libcublas.so in the shared library. Enter below command in the terminal.

ls /usr/lib | grep lib

If you don't have libcudnn.so and libcublas.so files, you need to find their location by below command.

locate libcudnn.so locate libcublas.so

In my case, libcudnn.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 And libcublas.so is located under /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1

Once you locate them, you need to add them into the shared library by following the steps below.

Enter usr/lib folder cd /usr/lib

Create libcudnn.so and libcublas.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/include/libcudnn.so.8.9.1 libcudnn.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1 libcublas.so

Now, check whether they are added to the shared library, ls /usr/lib | grep lib

If you can find libcudnn.so and libcublas.so with the above command, you wouldn't be having the issue.

The version of cuda I have installed is12.3,Following this version, I downloaded the cudnn package and extracted it to /usr/local/cuda/include和/usr/local/cuda/lib64 cd usr/lib sudo ln -s /usr/local/cuda-12.3/targets/x86_64-linux/lib/libcudnn.so.8.9.1 libcudnn.so sudo ln -s /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcublas.so.12.1.3.1 libcublas.so A new error has occurred: Could not load library libcudnn_ops_infer.so.8libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory C++ Traceback(most recent call last): 0315114205

JoshC8C7 commented 3 months ago

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

lokeish commented 2 months ago

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

I tried the same, but error still exists.

plmsmile commented 2 months ago

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

I tried the same, but error still exists.

same as you, tried the @JoshC8C7 's tips. but still not work.

lokeish commented 2 months ago

Thanks @Gokulnath-V. Your advice resolved my issues with paddle running on gpu.

I've fix the previous bugs according to the instruction. However, a new error demonstrates that "Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_ops_infer.so.8 is in your library path!". Any suggestions?

If you're missing libcudnn_ops_infer.so.8 or similar you need to do the same thing to add it to your library path. There's a few like this so I found it easiest to just do sudo ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/* . (will vary based on wherever you found e.g. locate libcudnn_ops_infer.so.8)

I tried the same, but error still exists.

My problem is resolved, i have installed cudnn properly, i am using paddlepaddle-gup==2.6.0. My cuda version is 12.2, nvidia-driver version is 535.