Open ignorejjj opened 5 months ago
如果unsetLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/这个变量呢?可以给我看下你的pip list有没有nvidia提供的nccl库呢?
如果不设置上述变量,直接在导入paddle的时候会出现下面的错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/__init__.py", line 28, in <module>
from .base import core # noqa: F401
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/__init__.py", line 36, in <module>
from . import core
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/core.py", line 380, in <module>
raise e
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/core.py", line 268, in <module>
from . import libpaddle
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/libpaddle.so)
我的pip list如下:
aiohttp==3.9.5
aiosignal==1.3.1
aistudio-sdk==0.2.4
annotated-types==0.7.0
anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1708355285029/work
astor @ file:///home/conda/feedstock_root/build_artifacts/astor_1593610464257/work
async-timeout==4.0.3
attrs==23.2.0
Babel==2.15.0
bce-python-sdk==0.9.11
blinker==1.8.2
certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1707022139797/work/certifi
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
colorlog==6.8.2
contourpy==1.2.1
cycler==0.12.1
datasets==2.19.1
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
dill==0.3.4
distro==1.9.0
dnspython==2.6.1
email_validator==2.1.1
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1704921103267/work
fastapi==0.111.0
fastapi-cli==0.0.4
filelock==3.14.0
Flask==3.0.3
flask-babel==4.0.0
flatbuffers==24.3.25
fonttools==4.52.1
frozenlist==1.4.1
fsspec==2024.3.1
future==1.0.0
h11 @ file:///home/conda/feedstock_root/build_artifacts/h11_1664132893548/work
h2 @ file:///home/conda/feedstock_root/build_artifacts/h2_1633502706969/work
hpack==4.0.0
httpcore @ file:///home/conda/feedstock_root/build_artifacts/httpcore_1711596990900/work
httptools==0.6.1
httpx @ file:///home/conda/feedstock_root/build_artifacts/httpx_1708530890843/work
huggingface-hub==0.23.2
humanfriendly==10.0
hyperframe @ file:///home/conda/feedstock_root/build_artifacts/hyperframe_1619110129307/work
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1713279365350/work
importlib_metadata==7.1.0
importlib_resources==6.4.0
itsdangerous==2.2.0
jieba==0.42.1
Jinja2==3.1.4
joblib==1.4.2
kiwisolver==1.4.5
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.0
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.12.2
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1707225342954/work/dist/numpy-1.26.4-cp39-cp39-linux_x86_64.whl#sha256=c799942b5898f6e6c60264d1663a6469a475290e758c654aeeb78e2596463abd
onnx==1.16.1
onnxruntime==1.16.3
opt-einsum @ file:///home/conda/feedstock_root/build_artifacts/opt_einsum_1696448916724/work
orjson==3.10.3
packaging==24.0
paddle2onnx==1.2.3
paddlefsl==1.1.0
paddlenlp==2.8.0.post0
paddlepaddle-gpu==2.6.1
pandas==2.2.2
pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1712154461189/work
prettytable==3.10.0
protobuf==4.25.3
psutil==5.9.8
pyarrow==16.1.0
pyarrow-hotfix==0.6
pybind11==2.12.0
pycryptodome==3.20.0
pydantic==2.7.1
pydantic_core==2.18.2
Pygments==2.18.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
rarfile==4.2
regex==2024.5.15
requests==2.32.2
rich==13.7.1
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.13.1
sentencepiece==0.2.0
seqeval==1.2.2
shellingham==1.5.4
six==1.16.0
sniffio @ file:///home/conda/feedstock_root/build_artifacts/sniffio_1708952932303/work
starlette==0.37.2
sympy==1.12.1rc1
threadpoolctl==3.5.0
tool-helpers==0.1.1
tqdm==4.66.4
typer==0.12.3
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1712329955671/work
tzdata==2024.1
ujson==5.10.0
urllib3==2.2.1
uvicorn==0.29.0
uvloop==0.19.0
visualdl==2.5.3
watchfiles==0.22.0
wcwidth==0.2.13
websockets==12.0
Werkzeug==3.0.3
xxhash==3.4.1
yarl==1.9.4
zipp==3.19.0
似乎没有nccl。
你这个安装的不是最新的包吗?现在最新版本的包已经不需要这些复杂的环境了
我安装的是官网上的最新版本2.6.1. 如果有更新的版本能麻烦提供一下安装的地址吗?
这个包可以直接包含所有的环境信息,而且不依赖你本地的cuda的版本,意思是即使你本地的cuda不是cuda11.8和cuda12也可以使用,你试试吧
解决了吗?这个包马上就上官方文档了
稍等我测试一下
安装过程中一直在反复下载paddle的wheel文件,似乎是一直找不到一个合适的版本。这是正常的吗?
Downloading https://paddle-whl.bj.bcebos.com/nightly/cu120/paddlepaddle-gpu/paddlepaddle_gpu-3.0.0.dev20240527-cp39-cp39-linux_x86_64.whl (736.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 736.9/736.9 MB 4.7 MB/s eta 0:00:00
Downloading https://paddle-whl.bj.bcebos.com/nightly/cu120/paddlepaddle-gpu/paddlepaddle_gpu-3.0.0.dev20240525-cp39-cp39-linux_x86_64.whl (736.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 736.1/736.1 MB 6.1 MB/s eta 0:00:00
Downloading https://paddle-whl.bj.bcebos.com/nightly/cu120/paddlepaddle-gpu/paddlepaddle_gpu-3.0.0.dev20240524-cp39-cp39-linux_x86_64.whl (736.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 736.1/736.1 MB 6.4 MB/s eta 0:00:00
Downloading https://paddle-whl.bj.bcebos.com/nightly/cu120/paddlepaddle-gpu/paddlepaddle_gpu-3.0.0.dev20240523-cp39-cp39-linux_x86_64.whl (733.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 733.6/733.6 MB 2.5 MB/s eta 0:00:00
Downloading https://paddle-whl.bj.bcebos.com/nightly/cu120/paddlepaddle-gpu/paddlepaddle_gpu-3.0.0.dev20240522-cp39-cp39-linux_x86_64.whl (733.6 MB)
稍等,我看下
你试一下cuda11.8呢?也有这个问题吗?
稍等,定位到问题了,我这里更新下就好了
好的 等更新好了我再试试
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu120/ 好了,再试试吧
好像还是有一样的问题。
不会啊,我本地就可以了,还是一直安装不同版本的paddle吗?
我这里是正常的,你再试下?
我清除了之前下载的cache,重新安装,还是会出现一样的问题:
你升级下你的pip呢?
pip目前就是最新的版本
--no-cache-dir加这个参数试试?我这里两台机器都没问题
我换了一个conda环境之后安装成功了。目前一台机器上能够通过paddle的run_check, 另一台机器上会出现下面的问题:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/__init__.py", line 33, in <module>
from .base import core # noqa: F401
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/__init__.py", line 38, in <module>
from . import ( # noqa: F401
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/backward.py", line 25, in <module>
from . import core, framework, log_helper, unique_name
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/core.py", line 384, in <module>
raise e
File "/fs/fast/u20238046/envs/paddle/lib/python3.9/site-packages/paddle/base/core.py", line 267, in <module>
from . import libpaddle
ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory
感觉是本地python环境的问题
你用的是我们的镜像吗?
我用的是刚刚的代码安装的。
我设置了一下环境变量,现在已经正常运行了。非常感谢!
那个环境变量?
设置了一下这个:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
问题描述 Issue Description
按照官网的安装步骤进行后出现下面的错误:
由于之前另一个bug, 在跑这个前对环境变量进行了设置:
版本&环境信息 Version & Environment Information
Paddle version: 2.6.1 Paddle With CUDA: True
OS: centos 7 GCC version: (Spack GCC) 9.5.0 Clang version: N/A CMake version: N/A Libc version: glibc 2.17 Python version: 3.9.19
CUDA version: 11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0 cuDNN version: N/A Nvidia driver version: 525.60.13 Nvidia driver List: GPU 0: NVIDIA A800-SXM4-80GB GPU 1: NVIDIA A800-SXM4-80GB GPU 2: NVIDIA A800-SXM4-80GB GPU 3: NVIDIA A800-SXM4-80GB GPU 4: NVIDIA A800-SXM4-80GB GPU 5: NVIDIA A800-SXM4-80GB GPU 6: NVIDIA A800-SXM4-80GB GPU 7: NVIDIA A800-SXM4-80GB