PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.29k stars 5.61k forks source link

官方文档提供的docker容器import paddle会报错Illegal instruction (core dumped) #69003

Open Wong4j opened 3 weeks ago

Wong4j commented 3 weeks ago

问题描述 Issue Description

docker run --gpus all --name paddle -it registry.baidubce.com/paddlepaddle/paddle:3.0.0b1-gpu-cuda12.3-cudnn9.0-trt8.6 /bin/bash

python
>>> import paddle
Illegal instruction (core dumped)

版本&环境信息 Version & Environment Information


Paddle version: N/A Paddle With CUDA: N/A

OS: ubuntu 22.04 GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Clang version: N/A CMake version: version 3.22.1 Libc version: glibc 2.35 Python version: 3.10.6

CUDA version: N/A cuDNN version: N/A Nvidia driver version: 565.57.01 Nvidia driver List: GPU 0: NVIDIA A100 80GB PCIe


wanghuancoder commented 3 weeks ago

感谢您的问题反馈,请 @risemeup1 帮忙看一下,感谢!

risemeup1 commented 3 weeks ago

https://paddle-qa.bj.bcebos.com/paddle-pipeline/CompileServicing-LinuxCentos-Commit-Cuda123-WITH_PIP_CUDA_LIBRARIES_ON-WITH_AVX_OFF/59dba7a7b8b3f0f7cfa7cc4e5b4dd28a38b1f431/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl

试试这个包,如果可以用请给我反馈下

Wong4j commented 3 weeks ago

https://paddle-qa.bj.bcebos.com/paddle-pipeline/CompileServicing-LinuxCentos-Commit-Cuda123-WITH_PIP_CUDA_LIBRARIES_ON-WITH_AVX_OFF/59dba7a7b8b3f0f7cfa7cc4e5b4dd28a38b1f431/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl

试试这个包,如果可以用请给我反馈下

可以用

risemeup1 commented 3 weeks ago

https://paddle-qa.bj.bcebos.com/paddle-pipeline/CompileServicing-LinuxCentos-Commit-Cuda123-WITH_PIP_CUDA_LIBRARIES_ON-WITH_CINN_OFF/59dba7a7b8b3f0f7cfa7cc4e5b4dd28a38b1f431/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl 幸苦再试下这个,帮我们定位下原因,我们好找到问题根源

Wong4j commented 3 weeks ago

https://paddle-qa.bj.bcebos.com/paddle-pipeline/CompileServicing-LinuxCentos-Commit-Cuda123-WITH_PIP_CUDA_LIBRARIES_ON-WITH_CINN_OFF/59dba7a7b8b3f0f7cfa7cc4e5b4dd28a38b1f431/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl 幸苦再试下这个,帮我们定位下原因,我们好找到问题根源

这个会报错Illegal instruction (core dumped)

XieJJ99 commented 2 weeks ago

https://paddle-qa.bj.bcebos.com/paddle-pipeline/CompileServicing-LinuxCentos-Commit-Cuda123-WITH_PIP_CUDA_LIBRARIES_ON-WITH_CINN_OFF/59dba7a7b8b3f0f7cfa7cc4e5b4dd28a38b1f431/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl 幸苦再试下这个,帮我们定位下原因,我们好找到问题根源

用户的CPU对AVX的支持是不一样的,从AVX/AVX2/AVX 512都有可能。 虽然在服务器CPU上AVX512非常常见,但是Intel桌面端CPU从13代开始,已经默认不支持AVX512了,只支持AVX2了。 如果你们默认Build AVX512的版本,很容易遇到指令无效情况,对开发者非常不友好。 建议你们默认支持AVX2,再加上对CPU的指令集检测,如果CPU支持AVX512,再执行特殊指令。

Wong4j commented 2 weeks ago

@XieJJ99 我是直接进入到Paddle官方提供的docker container里面,在python里面 import paddle就会出错,并没有build任何东西。

建议你们默认支持AVX2,再加上对CPU的指令集检测

请问这个我需要设置什么呢?

XieYunshen commented 2 weeks ago

@XieJJ99 我是直接进入到Paddle官方提供的docker container里面,在python里面 import paddle就会出错,并没有build任何东西。

建议你们默认支持AVX2,再加上对CPU的指令集检测

请问这个我需要设置什么呢?

可以执行一下下面的指令,发一下执行结果么? lscpu | grep -o 'avx[^ ]*' 另外,辛苦再帮忙试一下下面这个链接安装后是否可用 python3 -m pip install https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-TagBuild-Training-Linux-Gpu-Cuda11.8-Cudnn8.6-Mkl-Avx-Gcc8.2-SelfBuiltPypiUse/11d1f4835f5afce78c0e9882f144877b3c4a9aac/paddlepaddle_gpu-3.0.0.dev20241103-cp310-cp310-linux_x86_64.whl --break-system-packages

yum-dnf commented 2 weeks ago

i have the same problem on my notebook. maybe i can have a try on my server.

rainote2020 commented 1 week ago

lscpu | grep -o 'avx[^ ]*'

我有相同问题,至强服务器上3.0可以,而本地的13700就报错:

服务器 ubuntu22容器+cuda12.2+paddle3.0gpu (Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz+3090*2)

avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512_vnni

PC ubuntu22宿主+cuda12.4+paddle3.0gpu (13th Gen Intel(R) Core(TM) i7-13700+4060)

avx avx2 avx_vnni

您给出的这个命令无法直接以运行,报错:

python3 -m pip install https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-TagBuild-Training-Linux-Gpu-Cuda11.8-Cudnn8.6-Mkl-Avx-Gcc8.2-SelfBuiltPypiUse/11d1f4835f5afce78c0e9882f144877b3c4a9aac/paddlepaddle_gpu-3.0.0.dev20241103-cp310-cp310-linux_x86_64.whl --break-system-packages Usage:
/usr/bin/python3 -m pip install [options] [package-index-options] ... /usr/bin/python3 -m pip install [options] -r [package-index-options] ... /usr/bin/python3 -m pip install [options] [-e] ... /usr/bin/python3 -m pip install [options] [-e] ... /usr/bin/python3 -m pip install [options] <archive url/path> ... no such option: --break-system-packages

nadirvishun commented 4 days ago

@XieYunshen 要怎么解决,能否给个临时解决方案?


目前用下面这个测试的,虽然有warning,但是可以后续执行。看其他issue说用11.8版本的正常,但是我没有试过。

https://paddle-qa.bj.bcebos.com/paddle-pipeline/CompileServicing-LinuxCentos-Commit-Cuda123-WITH_PIP_CUDA_LIBRARIES_ON-WITH_AVX_OFF/59dba7a7b8b3f0f7cfa7cc4e5b4dd28a38b1f431/paddlepaddle_gpu-0.0.0-cp310-cp310-linux_x86_64.whl

试试这个包,如果可以用请给我反馈下