PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.09k stars 5.55k forks source link

Arm编译Paddle生成whl包安装后报错 #51536

Closed minboo closed 2 weeks ago

minboo commented 1 year ago

问题描述 Issue Description

编译出paddlepaddle的whl包后进行import paddle出现以下错误:

>>> import paddle
Error: Can not import paddle core while this file exists: /usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/libpaddle.so
Traceback (most recent call last):
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/core.py", line 268, in <module>
    from . import libpaddle
ImportError: /usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/libpaddle.so: ELF load command alignment not page-aligned

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/__init__.py", line 27, in <module>
    from .framework import monkey_patch_variable
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/framework/__init__.py", line 17, in <module>
    from . import random  # noqa: F401
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/framework/random.py", line 16, in <module>
    import paddle.fluid as fluid
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/__init__.py", line 36, in <module>
    from . import framework
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/framework.py", line 33, in <module>
    from . import core
  File "/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/core.py", line 328, in <module>
    if not avx_supported() and libpaddle.is_compiled_with_avx():
NameError: name 'libpaddle' is not defined

下面是cmake输出的日志:

CMake Deprecation Warning at CMakeLists.txt:25 (cmake_policy):
  The OLD behavior for policy CMP0026 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.

-- Found Paddle host system: ubuntu, version: 18.04.6
-- Found Paddle host system's CPU: 64 cores
-- The CXX compiler identification is GNU 8.4.0
-- The C compiler identification is GNU 8.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
EXP_CUDA_MODULE_LOADING_LAZY only works with GPU
-- CXX compiler: /usr/bin/c++, version: GNU 8.4.0
-- C compiler: /usr/bin/cc, version: GNU 8.4.0
-- AR tools: /usr/bin/ar
-- Found Git: /usr/bin/git (found version "2.17.1") 
-- Performing Test MMX_FOUND
-- Performing Test MMX_FOUND - Failed
-- Performing Test SSE2_FOUND
-- Performing Test SSE2_FOUND - Failed
-- Performing Test SSE3_FOUND
-- Performing Test SSE3_FOUND - Failed
-- Performing Test AVX_FOUND
-- Performing Test AVX_FOUND - Failed
-- Performing Test AVX2_FOUND
-- Performing Test AVX2_FOUND - Failed
-- Performing Test AVX512F_FOUND
-- Performing Test AVX512F_FOUND - Failed
Enable Custom Device when compiling for Linux. Force WITH_CUSTOM_DEVICE=ON.
CMake Warning at CMakeLists.txt:388 (message):
  Disable NCCL when compiling without GPU.  Force WITH_NCCL=OFF.

CMake Warning at CMakeLists.txt:477 (message):
  Disable RCCL when compiling without ROCM.  Force WITH_RCCL=OFF.

-- Do not have AVX2 intrinsics and disabled MKL-DNN.
-- warp-ctc library: /home/Paddle/build/third_party/install/warpctc/lib/libwarpctc.so
-- warp-rnnt library: /home/Paddle/build/third_party/install/warprnnt/lib/libwarprnnt.so
-- Build OpenBLAS by External Project (include: /home/Paddle/build/third_party/install/openblas/include, library: /home/Paddle/build/third_party/install/openblas/lib/libopenblas.a)
-- CBLAS_PROVIDER: EXTERN_OPENBLAS
-- Protobuf protoc executable: /home/Paddle/build/third_party/install/protobuf/bin/protoc
-- Protobuf-lite library: /home/Paddle/build/third_party/install/protobuf/lib/libprotobuf-lite.a
-- Protobuf library: /home/Paddle/build/third_party/install/protobuf/lib/libprotobuf.a
-- Protoc library: /home/Paddle/build/third_party/install/protobuf/lib/libprotoc.a
-- Protobuf version: 3.1.0
-- Found PythonInterp: /usr/local/python3.7.5/bin/python3 (found suitable version "3.7.5", minimum required is "3.7") 
-- Found PythonLibs: /usr/local/python3.7.5/lib/libpython3.7m.so (found suitable version "3.7.5", minimum required is "3.7") 
-- Found PY_pip: /usr/local/python3.7.5/lib/python3.7/site-packages/pip  
-- Found PY_numpy: /usr/local/python3.7.5/lib/python3.7/site-packages/numpy  
-- Found PY_wheel: /usr/local/python3.7.5/lib/python3.7/site-packages/wheel  
-- Found PY_google.protobuf: /usr/local/python3.7.5/lib/python3.7/site-packages/google/protobuf  
-- Found NumPy: /usr/local/python3.7.5/lib/python3.7/site-packages/numpy/core/include  
POCKETFFT_INCLUDE_DIR is /home/Paddle/build/third_party/pocketfft/src
CMake Warning at cmake/flags.cmake:12 (message):
  Found GCC 8.4.0 which is too high, recommended to use GCC 8.2
Call Stack (most recent call first):
  cmake/flags.cmake:36 (checkcompilercxx14flag)
  CMakeLists.txt:585 (include)

-- Looking for UINT64_MAX
-- Looking for UINT64_MAX - found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of pthread_spinlock_t
-- Check size of pthread_spinlock_t - done
-- Check size of pthread_barrier_t
-- Check size of pthread_barrier_t - done
-- Performing Test C_COMPILER_SUPPORT_FLAG__fPIC
-- Performing Test C_COMPILER_SUPPORT_FLAG__fPIC - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__fPIC
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__fPIC - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__fno_omit_frame_pointer
-- Performing Test C_COMPILER_SUPPORT_FLAG__fno_omit_frame_pointer - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__fno_omit_frame_pointer
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__fno_omit_frame_pointer - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Werror
-- Performing Test C_COMPILER_SUPPORT_FLAG__Werror - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Werror
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Werror - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wall
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wall - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wall
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wall - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wextra
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wextra - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wextra
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wextra - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wnon_virtual_dtor
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wnon_virtual_dtor - Failed
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wnon_virtual_dtor
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wnon_virtual_dtor - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wdelete_non_virtual_dtor
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wdelete_non_virtual_dtor - Failed
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wdelete_non_virtual_dtor
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wdelete_non_virtual_dtor - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_unused_parameter
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_unused_parameter - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_unused_parameter
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_unused_parameter - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_unused_function
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_unused_function - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_unused_function
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_unused_function - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_literal_suffix
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_literal_suffix - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_literal_suffix
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_literal_suffix - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_ignored_attributes
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_ignored_attributes - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_ignored_attributes
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_ignored_attributes - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_terminate
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_terminate - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_terminate
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_terminate - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_int_in_bool_context
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_int_in_bool_context - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_int_in_bool_context
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_int_in_bool_context - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wimplicit_fallthrough_0
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wimplicit_fallthrough_0 - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wimplicit_fallthrough_0
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wimplicit_fallthrough_0 - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_maybe_uninitialized
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_maybe_uninitialized - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_maybe_uninitialized
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_error_maybe_uninitialized - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_ignored_qualifiers
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_ignored_qualifiers - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_ignored_qualifiers
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_ignored_qualifiers - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_ignored_attributes
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_ignored_attributes - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_ignored_attributes
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_ignored_attributes - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_parentheses
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_parentheses - Success
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_parentheses
-- Performing Test CXX_COMPILER_SUPPORT_FLAG__Wno_parentheses - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_unused_local_typedefs
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_unused_local_typedefs - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_unused_function
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_unused_function - Success
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_array_bounds
-- Performing Test C_COMPILER_SUPPORT_FLAG__Wno_error_array_bounds - Success
-- Paddle version is 0.0.0
-- On inference mode, will take place some specific optimization.
Merge and generate static library: phi_static_1
-- commit: 17ec162053
-- branch: develop
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: pyyaml in /usr/local/python3.7.5/lib/python3.7/site-packages (6.0)
Requirement already satisfied: jinja2 in /usr/local/python3.7.5/lib/python3.7/site-packages (3.1.2)
Requirement already satisfied: typing-extensions in /usr/local/python3.7.5/lib/python3.7/site-packages (4.5.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/python3.7.5/lib/python3.7/site-packages (from jinja2) (2.1.2)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.3.1; however, version 23.0.1 is available.
You should consider upgrading via the '/usr/local/python3.7.5/bin/python3 -m pip install --upgrade pip' command.
parse op yamls:
- /home/Paddle/paddle/phi/api/yaml/ops.yaml
- /home/Paddle/paddle/phi/api/yaml/legacy_ops.yaml
- /home/Paddle/paddle/phi/api/yaml/backward.yaml
- /home/Paddle/paddle/phi/api/yaml/legacy_backward.yaml
validate op yaml:
- /home/Paddle/paddle/fluid/operators/generator/parsed_ops/ops.parsed.yaml
- /home/Paddle/paddle/fluid/operators/generator/parsed_ops/backward_ops.parsed.yaml
create or remove auto-geneated operators: /home/Paddle/paddle/fluid/operators/generated_op.cc.tmp
create or remove auto-geneated argument mappings: /home/Paddle/paddle/phi/ops/compat/generated_sig.cc.tmp
copy /home/Paddle/paddle/fluid/operators/generated_op.cc.tmp /home/Paddle/paddle/fluid/operators/generated_op.cc
copy /home/Paddle/paddle/fluid/operators/generated_static_op.cc.tmp /home/Paddle/paddle/fluid/operators/generated_static_op.cc
copy /home/Paddle/paddle/fluid/operators/generated_sparse_op.cc.tmp /home/Paddle/paddle/fluid/operators/generated_sparse_op.cc
copy /home/Paddle/paddle/phi/ops/compat/generated_sig.cc.tmp /home/Paddle/paddle/phi/ops/compat/generated_sig.cc
copy /home/Paddle/paddle/phi/ops/compat/generated_static_sig.cc.tmp /home/Paddle/paddle/phi/ops/compat/generated_static_sig.cc
copy /home/Paddle/paddle/phi/ops/compat/generated_sparse_sig.cc.tmp /home/Paddle/paddle/phi/ops/compat/generated_sparse_sig.cc
generate /home/Paddle/paddle/fluid/operators/ops_extra_info.cc
Performing Eager Dygraph Auto Code Generation
Final State Eager CodeGen
Generate dygraph file structure at path: /home/Paddle/paddle/fluid/eager/generated
WITH_DLNNE:
-- Configuring done
-- Generating done
-- Build files have been written to: /home/Paddle/build

版本&环境信息 Version & Environment Information


PaddlePaddle版本:develop分支下编译 操作系统:宿主机UOS-1050e, 使用的镜像为docker pull registry.baidubce.com/paddlepaddle/serving:ascend-aarch64-cann3.3.0-paddlelite-devel Ubuntu18.04 CPU:宿主机鲲鹏920
NPU:Atlas310I 型号3000 Python版本:python3.7.5 gcc:8.4.0 cmake版本:3.16.8 按照这个文档编译:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/compile/arm-compile.html cmake命令:cmake .. -DPY_VERSION=3.7 -DPYTHON_EXECUTABLE=which python3-DWITH_ARM=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DON_INFER=ON -DWITH_XBYAK=OFF


paddle-bot[bot] commented 1 year ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

shiyutang commented 1 year ago

建议先尝试备份并移除提示的文件:/usr/local/python3.7.5/lib/python3.7/site-packages/paddle/fluid/libpaddle.so

qili93 commented 1 year ago

造成这个问题的原因是 patchelf 在 ARMv8 下的识别的 page-size 不一致导致的

详细原因可以参考 https://github.com/NixOS/patchelf/pull/216 的PR描述,建议在您的环境中运行如下命令,将编译和运行环境中的 patchelf 版本进行升级,升级之后再重新编译和运行即可

wget -O /opt/0.14.5.tar.gz https://github.com/NixOS/patchelf/archive/refs/tags/0.14.5.tar.gz && \ cd /opt && tar xzf 0.14.5.tar.gz && cd /opt/patchelf-0.14.5 && ./bootstrap.sh && ./configure && \ make && make install && cd /opt && rm -rf patchelf-0.14.5 && rm -rf 0.14.5.tar.gz

minboo commented 1 year ago

造成这个问题的原因是 patchelf 在 ARMv8 下的识别的 page-size 不一致导致的

详细原因可以参考 NixOS/patchelf#216 的PR描述,建议在您的环境中运行如下命令,将编译和运行环境中的 patchelf 版本进行升级,升级之后再重新编译和运行即可

wget -O /opt/0.14.5.tar.gz https://github.com/NixOS/patchelf/archive/refs/tags/0.14.5.tar.gz && cd /opt && tar xzf 0.14.5.tar.gz && cd /opt/patchelf-0.14.5 && ./bootstrap.sh && ./configure && make && make install && cd /opt && rm -rf patchelf-0.14.5 && rm -rf 0.14.5.tar.gz

您好,我根据您说的更新了patchelf的版本,上述问题解决了,但是验证安装时出现以下报错

root@localhost:/home/Paddle/build# python3
Python 3.7.10 (default, Mar 15 2021, 20:52:10) 
[GCC 10.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
I0313 19:29:32.091966 180539 interpretercore.cc:282] New Executor is Running.
I0313 19:29:32.099856 180539 interpreter_util.cc:574] Standalone Executor is Used.
PaddlePaddle works well on 1 CPU.
/opt/conda/lib/python3.7/site-packages/paddle/distributed/spawn.py:305: UserWarning: Your model will be trained under CPUONLY mode by using GLOO,because CPUPlace is specified manually or your installed PaddlePaddle only support CPU Device.
  "Your model will be trained under CPUONLY mode by using GLOO,"
grep: grep: warning: GREP_OPTIONS is deprecated; please use an alias or scriptwarning: GREP_OPTIONS is deprecated; please use an alias or script

I0313 19:29:32.950970 180739 tcp_utils.cc:179] The server starts to listen on IP_ANY:53387
I0313 19:29:32.951071 180740 tcp_utils.cc:128] Successfully connected to 127.0.0.1:53387
I0313 19:29:32.951176 180739 tcp_utils.cc:128] Successfully connected to 127.0.0.1:53387

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Termination signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1678706973 (unix time) try "date -d @1678706973" if you are using GNU date ***]
  [SignalInfo: *** SIGTERM (@0x2c13b) received by PID 180739 (TID 0xfffc81b128b0) from PID 180539 ***]

WARNING:root:PaddlePaddle meets some problem with 2 CPUs. This may be caused by:
 1. There is not enough GPUs visible on your system
 2. Some GPUs are occupied by other process now
 3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests 
 to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
WARNING:root:
 Original Error is: 

----------------------------------------------
Process 1 terminated with the following error:
----------------------------------------------

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/paddle/distributed/spawn.py", line 394, in _func_wrapper
    result = func(*args)
  File "/opt/conda/lib/python3.7/site-packages/paddle/utils/install_check.py", line 199, in train_for_run_parallel
    paddle.distributed.init_parallel_env()
  File "/opt/conda/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 1104, in init_parallel_env
    pg_options=None,
  File "/opt/conda/lib/python3.7/site-packages/paddle/distributed/collective.py", line 152, in _new_process_group_impl
    pg = core.ProcessGroupGloo.create(store, rank, world_size, group_id)
AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupGloo'

PaddlePaddle is installed successfully ONLY for single CPU! Let's start deep learning with PaddlePaddle now.

请问这是什么原因?因为我的宿主机有NPU,我是在装有cann的docker容器里编译的,下面给出我dockerrun命令:

docker run -itd --name npu-cann502 -v /home/myfile:/home/myfile  \
            --pids-limit 409600 --network=host --shm-size=128G \
            --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
            --device=/dev/davinci0 --device=/dev/davinci1 \
            --device=/dev/davinci2 --device=/dev/davinci3 \
            --device=/dev/davinci_manager \
            --device=/dev/devmm_svm \
            --device=/dev/hisi_hdc \
            -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
            -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
            -v /usr/local/dcmi:/usr/local/dcmi \
            paddlepaddle/paddle:latest-dev-cann5.0.2.alpha005-gcc82-aarch64 /bin/bash

我不知道是哪里出了问题,能否告知在华为NPU下如何进行编译?

hitzhu commented 1 year ago

造成这个问题的原因是 patchelf 在 ARMv8 下的识别的 page-size 不一致导致的 详细原因可以参考 NixOS/patchelf#216 的PR描述,建议在您的环境中运行如下命令,将编译和运行环境中的 patchelf 版本进行升级,升级之后再重新编译和运行即可 wget -O /opt/0.14.5.tar.gz https://github.com/NixOS/patchelf/archive/refs/tags/0.14.5.tar.gz && cd /opt && tar xzf 0.14.5.tar.gz && cd /opt/patchelf-0.14.5 && ./bootstrap.sh && ./configure && make && make install && cd /opt && rm -rf patchelf-0.14.5 && rm -rf 0.14.5.tar.gz

您好,我根据您说的更新了patchelf的版本,上述问题解决了,但是验证安装时出现以下报错

root@localhost:/home/Paddle/build# python3
Python 3.7.10 (default, Mar 15 2021, 20:52:10) 
[GCC 10.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
I0313 19:29:32.091966 180539 interpretercore.cc:282] New Executor is Running.
I0313 19:29:32.099856 180539 interpreter_util.cc:574] Standalone Executor is Used.
PaddlePaddle works well on 1 CPU.
/opt/conda/lib/python3.7/site-packages/paddle/distributed/spawn.py:305: UserWarning: Your model will be trained under CPUONLY mode by using GLOO,because CPUPlace is specified manually or your installed PaddlePaddle only support CPU Device.
  "Your model will be trained under CPUONLY mode by using GLOO,"
grep: grep: warning: GREP_OPTIONS is deprecated; please use an alias or scriptwarning: GREP_OPTIONS is deprecated; please use an alias or script

I0313 19:29:32.950970 180739 tcp_utils.cc:179] The server starts to listen on IP_ANY:53387
I0313 19:29:32.951071 180740 tcp_utils.cc:128] Successfully connected to 127.0.0.1:53387
I0313 19:29:32.951176 180739 tcp_utils.cc:128] Successfully connected to 127.0.0.1:53387

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Termination signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1678706973 (unix time) try "date -d @1678706973" if you are using GNU date ***]
  [SignalInfo: *** SIGTERM (@0x2c13b) received by PID 180739 (TID 0xfffc81b128b0) from PID 180539 ***]

WARNING:root:PaddlePaddle meets some problem with 2 CPUs. This may be caused by:
 1. There is not enough GPUs visible on your system
 2. Some GPUs are occupied by other process now
 3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests 
 to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
WARNING:root:
 Original Error is: 

----------------------------------------------
Process 1 terminated with the following error:
----------------------------------------------

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/paddle/distributed/spawn.py", line 394, in _func_wrapper
    result = func(*args)
  File "/opt/conda/lib/python3.7/site-packages/paddle/utils/install_check.py", line 199, in train_for_run_parallel
    paddle.distributed.init_parallel_env()
  File "/opt/conda/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 1104, in init_parallel_env
    pg_options=None,
  File "/opt/conda/lib/python3.7/site-packages/paddle/distributed/collective.py", line 152, in _new_process_group_impl
    pg = core.ProcessGroupGloo.create(store, rank, world_size, group_id)
AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupGloo'

PaddlePaddle is installed successfully ONLY for single CPU! Let's start deep learning with PaddlePaddle now.

请问这是什么原因?因为我的宿主机有NPU,我是在装有cann的docker容器里编译的,下面给出我dockerrun命令:

docker run -itd --name npu-cann502 -v /home/myfile:/home/myfile  \
            --pids-limit 409600 --network=host --shm-size=128G \
            --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
            --device=/dev/davinci0 --device=/dev/davinci1 \
            --device=/dev/davinci2 --device=/dev/davinci3 \
            --device=/dev/davinci_manager \
            --device=/dev/devmm_svm \
            --device=/dev/hisi_hdc \
            -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
            -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
            -v /usr/local/dcmi:/usr/local/dcmi \
            paddlepaddle/paddle:latest-dev-cann5.0.2.alpha005-gcc82-aarch64 /bin/bash

我不知道是哪里出了问题,能否告知在华为NPU下如何进行编译?

能分享一下编译生成的whe文件吗,多谢楼主了,我的邮箱是zhujihuai@gmail.com

lauarezn commented 1 year ago

请问有编译好的适配昇腾NPU的paddlepaddle的whl文件吗

paddle-bot[bot] commented 2 weeks ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。