PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.9k stars 5.49k forks source link

无法运行paddle.utils.run_check() #61528

Closed kewuyu closed 4 months ago

kewuyu commented 5 months ago

请提出你的问题 Please ask your question

已安装cuda,cudnn,安装paddle报错

import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
I0203 16:32:32.691774  1592 program_interpreter.cc:212] New Executor is Running.
W0203 16:32:32.692723  1592 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.3, Runtime API Version: 12.0
W0203 16:32:32.694978  1592 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Miconda\envs\paddle\lib\site-packages\paddle\utils\install_check.py", line 273, in run_check
    _run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
  File "D:\Miconda\envs\paddle\lib\site-packages\paddle\utils\install_check.py", line 151, in _run_static_single
    exe.run(
  File "D:\Miconda\envs\paddle\lib\site-packages\paddle\base\executor.py", line 1742, in run
    res = self._run_impl(
  File "D:\Miconda\envs\paddle\lib\site-packages\paddle\base\executor.py", line 1948, in _run_impl
    ret = new_exe.run(
  File "D:\Miconda\envs\paddle\lib\site-packages\paddle\base\executor.py", line 827, in run
    tensors = self._new_exe.run(
RuntimeError: In user code:

    File "<stdin>", line 1, in <module>

    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\utils\install_check.py", line 273, in run_check
      _run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\utils\install_check.py", line 135, in _run_static_single
      input, out, weight = _simple_network()
    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\utils\install_check.py", line 37, in _simple_network
      linear_out = paddle.nn.functional.linear(x=input, weight=weight, bias=bias)
    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\nn\functional\common.py", line 1985, in linear
      helper.append_op(
    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\base\layer_helper.py", line 44, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\base\framework.py", line 4467, in append_op
      op = Operator(
    File "D:\Miconda\envs\paddle\lib\site-packages\paddle\base\framework.py", line 3016, in __init__
      for frame in traceback.extract_stack():

    PreconditionNotMetError: The third-party dynamic library (cublas64_120.dll;cublas64_12.dll) that Paddle depends on is not configured correctly. (error code is 126)
      Suggestions:
      1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
      2. Configure third-party dynamic library environment variables as follows:
      - Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
      - Windows: set PATH by `set PATH=XXX; (at ..\paddle\phi\backends\dynload\dynamic_loader.cc:312)
      [operator < matmul_v2 > error]

cudnn测试结果


C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6>cd .\extras\demo_suite

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>.\bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: NVIDIA GeForce RTX 3060 Laptop GPU
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12122.6

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12883.0

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     302162.9

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060 Laptop GPU"
  CUDA Driver Version / Runtime Version          12.3 / 11.6
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 6144 MBytes (6441926656 bytes)
  (30) Multiprocessors, (128) CUDA Cores/MP:     3840 CUDA Cores
  GPU Max Clock rate:                            1425 MHz (1.42 GHz)
  Memory Clock rate:                             7001 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               zu bytes
  Total amount of shared memory per block:       zu bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          zu bytes
  Texture alignment:                             zu bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 11.6, NumDevs = 1, Device0 = NVIDIA GeForce RTX 3060 Laptop GPU
Result = PASS
Vvsmile commented 4 months ago

请提供一下您的Paddle版本,感谢。 另外,目前看来您使用的cuda有些高,请降为12.0再进行paddle.utils.run_check()。 此外,还有一种方式可以尝试,可以寻找现有的cublas64_11.dll放入C:\Windows\System32目录下或PATH环境变量下目录里,但这个不保证可以解决所有问题,有可能会在后续使用里出现和当前版本cuda不兼容的隐患。 推荐降级为12.0的解决方案。