PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.29k stars 5.61k forks source link

paddle.utils.run_check()报错 #69096

Open Catherine-aka opened 3 weeks ago

Catherine-aka commented 3 weeks ago

bug描述 Describe the Bug

在环境里执行 paddle.utils.run_check()报错信息如下: image 根据提示信息,nvcc --version查看cuda的版本是11.8,随后在paddle官网安装建议版本的paddlepaddle-gpu, image 再次执行paddle.utils.run_check()还是同样的报错。

其他补充信息 Additional Supplementary Information

No response

risemeup1 commented 3 weeks ago

你好,sm90架构需要paddle的包是cuda12,你安装cuda12.3就好,不用管你本地的cuda版本,我们的包不依赖你本地额cuda的版本,即使你的环境是cuda11.8也可以用我们的cuda12。3的包

Catherine-aka commented 3 weeks ago

你好,sm90架构需要paddle的包是cuda12,你安装cuda12.3就好,不用管你本地的cuda版本,我们的包不依赖你本地额cuda的版本,即使你的环境是cuda11.8也可以用我们的cuda12。3的包

我机器的驱动是12.2,cuda要安装12.3吗? Fri Nov 1 00:11:10 2024
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 CF-NG-HZZ1-O On | 00000000:18:00.0 Off | 0 | | N/A 52C P0 190W / 700W | 70062MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 1 CF-NG-HZZ1-O On | 00000000:38:00.0 Off | 0 | | N/A 39C P0 173W / 700W | 70124MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 2 CF-NG-HZZ1-O On | 00000000:49:00.0 Off | 0 | | N/A 51C P0 184W / 700W | 70110MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 3 CF-NG-HZZ1-O On | 00000000:59:00.0 Off | 0 | | N/A 37C P0 170W / 700W | 70124MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 4 CF-NG-HZZ1-O On | 00000000:9B:00.0 Off | 0 | | N/A 38C P0 166W / 700W | 70110MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 5 CF-NG-HZZ1-O On | 00000000:BB:00.0 Off | 0 | | N/A 51C P0 174W / 700W | 70126MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 6 CF-NG-HZZ1-O On | 00000000:CA:00.0 Off | 0 | | N/A 36C P0 165W / 700W | 70118MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 7 CF-NG-HZZ1-O On | 00000000:DA:00.0 Off | 0 | | N/A 51C P0 175W / 700W | 69732MiB / 81559MiB | 100% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| +---------------------------------------------------------------------------------------+

Catherine-aka commented 3 weeks ago

你好,sm90架构需要paddle的包是cuda12,你安装cuda12.3就好,不用管你本地的cuda版本,我们的包不依赖你本地额cuda的版本,即使你的环境是cuda11.8也可以用我们的cuda12。3的包

无论安装2.6.1还是3.0的带cuda12的paddlepaddle-gpu,import paddle都会出现以下问题: Error: Can not import paddle core while this file exists: /home/ray/anaconda3/lib/python3.9/site-packages/paddle/base/libpaddle.so Traceback (most recent call last): File "", line 1, in File "/home/ray/anaconda3/lib/python3.9/site-packages/paddle/init.py", line 33, in from .base import core # noqa: F401 File "/home/ray/anaconda3/lib/python3.9/site-packages/paddle/base/init.py", line 38, in from . import ( # noqa: F401 File "/home/ray/anaconda3/lib/python3.9/site-packages/paddle/base/backward.py", line 25, in from . import core, framework, log_helper, unique_name File "/home/ray/anaconda3/lib/python3.9/site-packages/paddle/base/core.py", line 384, in raise e File "/home/ray/anaconda3/lib/python3.9/site-packages/paddle/base/core.py", line 267, in from . import libpaddle ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.13' not found (required by /home/ray/anaconda3/lib/python3.9/site-packages/paddle/base/libpaddle.so)

ccl-private commented 3 weeks ago

@Catherine-aka 这个问题我今天刚解决,是gcc版本要11,把系统gcc升级到11才有CXXABI_1.3.13 这个命令可以看你现在是否支持CXXABI_1.3.13: nm -D /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep CXXABI

如果你是ubuntu20系统 sudo apt install --reinstall ca-certificates sudo add-apt-repository --update ppa:ubuntu-toolchain-r/test sudo apt update sudo apt install gcc-11 g++-11 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 160 sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 160 在看看CXXABI_1.3.13是否已经有了nm -D /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep CXXABI 有就成功了

Catherine-aka commented 3 weeks ago

你好,sm90架构需要paddle的包是cuda12,你安装cuda12.3就好,不用管你本地的cuda版本,我们的包不依赖你本地额cuda的版本,即使你的环境是cuda11.8也可以用我们的cuda12。3的包

image 安装完提示报错,cuda的驱动是12.2,但是paddle使用的cuda是12.3还是报错不兼容。所以sm90架构需要cuda12,但是官方只有cuda12.3的paddle版本吗?

Catherine-aka commented 3 weeks ago

@Catherine-aka 这个问题我今天刚解决,是gcc版本要11,把系统gcc升级到11才有CXXABI_1.3.13 这个命令可以看你现在是否支持CXXABI_1.3.13: nm -D /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep CXXABI

如果你是ubuntu20系统 sudo apt install --reinstall ca-certificates sudo add-apt-repository --update ppa:ubuntu-toolchain-r/test sudo apt update sudo apt install gcc-11 g++-11 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 160 sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 160 在看看CXXABI_1.3.13是否已经有了nm -D /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep CXXABI 有就成功了

感谢!!,gcc这个问题已经解决!

risemeup1 commented 2 weeks ago

cuda12.3的包就可以啊,你不用关心你本地的cuda版本,直接下载paddle的官网的cuda12.3的包,我理解是可以用的,是遇到什么问题了吗?

Catherine-aka commented 2 weeks ago

cuda12.3的包就可以啊,你不用关心你本地的cuda版本,直接下载paddle的官网的cuda12.3的包,我理解是可以用的,是遇到什么问题了吗?

目前安装了paddle3.0.0b1cuda12.3,运行了一个简单的矩阵运算,nvidia-smi能看到gpu被使用,结果也正确。但是会有这个warning,是paddle编译的cuda版本和runtime API版本不同,这个问题需要解决吗,看提示问题还不小 image