PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.1k stars 5.55k forks source link

算能云RISCV环境编译后import paddle报错 #62037

Closed skywalk163 closed 7 months ago

skywalk163 commented 7 months ago

bug描述 Describe the Bug

环境:

openKylin 1.0.1 (GNU/Linux 6.1.31 riscv64) 
Linux 863c89a419ec 6.1.31 #1 SMP Tue Sep 12 00:30:01 CST 2023 riscv64 riscv64 riscv64 GNU/Linux

python3.8、cmake version 3.25.1

paddle2.6 develop源码编译

修改代码,参考这个pr:https://github.com/PaddlePaddle/Paddle/commit/d3db3835ec14dc5ca8d4c8a769164103f4703c64

编译:

cmake ../ -DWITH_GPU=OFF -WITH_RISCV=ON make TARGET=RISCV64_GENERIC -j16

编译完成后安装: pip install paddlepaddle-0.0.0-cp38-cp38-linux_riscv64.whl

测试:python3 -c "import paddle"

报错:

Error: Can not import paddle core while this file exists: /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/paddle/base/core.py", line 267, in <module>
    from . import libpaddle
ImportError: /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so: undefined symbol: __atomic_exchange_1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/paddle/__init__.py", line 30, in <module>
    from .base import core  # noqa: F401
  File "/usr/local/lib/python3.8/dist-packages/paddle/base/__init__.py", line 38, in <module>
    from . import (  # noqa: F401
  File "/usr/local/lib/python3.8/dist-packages/paddle/base/backward.py", line 25, in <module>
    from . import core, framework, log_helper, unique_name
  File "/usr/local/lib/python3.8/dist-packages/paddle/base/core.py", line 377, in <module>
    if not avx_supported() and libpaddle.is_compiled_with_avx():
NameError: name 'libpaddle' is not defined

希望能得到提示。装机记录:https://blog.csdn.net/skywalk8163/article/details/136264198

其他补充信息 Additional Supplementary Information

No response

silverling commented 7 months ago

参见:https://github.com/riscv-collab/riscv-gnu-toolchain/issues/183#issuecomment-253721765 你可以尝试在 cmake/flags.cmake 文件内添加如下内容,

if(WITH_RISCV)
  set(COMMON_FLAGS -latomic ${COMMON_FLAGS})
endif()

确保添加的位置在此之前:https://github.com/PaddlePaddle/Paddle/blob/7efc5235b34fdbd2bd74d8e3294c43c54a45c22e/cmake/flags.cmake#L236-L239

然后重新从头编译来看是否解决问题。

备注:我不熟悉 RISC-V 相关的开发,只是简单地搜索了一下

skywalk163 commented 7 months ago

用这个方法重新编译了,还是同样的报错。

silverling commented 7 months ago

你指定 WITH_RISCV=ON 选项了吗?在 CMake 指定编译选项时,应通过 -D 的方式,在你提供的 CMake 构建命令中少了字符 D

编译:

cmake ../ -DWITH_GPU=OFF -WITH_RISCV=ON make TARGET=RISCV64_GENERIC -j16

另外,如果这样仍然没有解决问题的话,可以尝试直接添加 -latomic 链接选项,

https://github.com/PaddlePaddle/Paddle/blob/eea10b17b24b80dcad2a6c955ad6cc1925adaa0b/paddle/fluid/pybind/CMakeLists.txt#L542-L546

在第 546 行添加如下内容,重新编译

target_link_libraries(${SHARD_LIB_NAME} atomic)
skywalk163 commented 7 months ago

指定了,前面缺失遗漏了一个字母,后面就纠正了。我再试试 。546行是直接添加吧?

silverling commented 7 months ago

是的

skywalk163 commented 7 months ago

添加了546行,还是不行

qili93 commented 7 months ago

您好,Paddle目前尚未官方支持过RISCV的CPU,根据您这里的错误提示信息,是由于riscv-gcc的bug导致的,具体原因和解决方法可以参考 https://www.overleaf.com/project/624c33c4e2d49b02be626e13

其中文档中提示的解决办法如下,请尝试修改下能否解决您的问题。

image

skywalk163 commented 7 months ago

感谢提供的帮助!

找到的问题的解决方法,是通过对比pytorch安装学到的。

root@863c89a419ec:~# ls /usr/local/lib/python3.8/dist-packages/paddle/base/
__init__.py  core.py            default_scope_funcs.py  executor.py     io.py                 libpaddle.so           param_attr.py    trainer_factory.py
__pycache__  data_feed_desc.py  device_worker.py        framework.py    layer_helper.py       lod_tensor.py          proto            unique_name.py
backward.py  data_feeder.py     dygraph                 incubate        layer_helper_base.py  log_helper.py          reader.py        variable_index.py
compiler.py  dataset.py         dygraph_utils.py        initializer.py  layers                multiprocess_utils.py  trainer_desc.py  wrapped_decorator.py
root@863c89a419ec:~# patchelf --add-needed libatomic.so.1  /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so 
root@863c89a419ec:~# python3
Python 3.8.2 (default, Jan 18 2024, 07:05:37) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
I0228 07:37:38.344522 279426 program_interpreter.cc:220] New Executor is Running.
I0228 07:37:38.475822 279426 interpreter_util.cc:652] Standalone Executor is Used.
PaddlePaddle works well on 1 CPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

也就是说只需要用patchelf注册一下就行了patchelf --add-needed libatomic.so.1 /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so

如果没有patchelf,apt install patchelf安装即可。

skywalk163 commented 7 months ago

算能云RISCV安装飞桨,圆满成功! 装机记录:https://blog.csdn.net/skywalk8163/article/details/136264198

silverling commented 7 months ago

感谢你分享的解决方案。 不过你的博客链接似乎不能访问(看起来像是博客后台编辑器的链接),可以发布文章重新分享一下链接吗?

skywalk163 commented 7 months ago

感谢你分享的解决方案。 不过你的博客链接似乎不能访问(看起来像是博客后台编辑器的链接),可以发布文章重新分享一下链接吗?

大意了,已修改。