PaddlePaddle / PaddleCustomDevice

PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
Apache License 2.0
59 stars 130 forks source link

寒武纪mlu 如何对PaddleCustomDevice的mlu进行源码编译? #1331

Open wangzy0327 opened 6 days ago

wangzy0327 commented 6 days ago

由于python版本要求使用3.8版本,不能直接使用安装python3.10版本的wheel包 paddle_custom_mlu.whl 可以给出paddlecustomdevice源码编译的步骤和命令么?谢谢! @YanhuiDua

YanhuiDua commented 6 days ago

你好,请参考 https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/mlu/README_cn.md中的源码编译部分

image

wangzy0327 commented 6 days ago

@YanhuiDua 我按照步骤,用python3.8进行源码编译 PaddleCustomDevice release/2.6版本,过程中遇到一些错误, 遇到的错误摘录如下:

Submodule path 'Paddle': checked out '90138318312fbb60b0bdce8b0f4fb317879fe62e'
-- PADDLE_SOURCE_DIR=/home/wzy/PaddleCustomDevice/Paddle
-- Paddle version is 0.0.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
...
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Looking for C++ include inttypes.h - found
-- Looking for C++ include sys/types.h
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
...
-- Generating done
-- Build files have been written to: /home/wzy/PaddleCustomDevice/backends/mlu/build/third_party/mkldnn/src/extern_mkldnn-build
[ 15%] Performing build step for 'extern_mkldnn'
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
...
[ 24%] Building CXX object src/cpu/x64/CMakeFiles/dnnl_cpu_x64.dir/cpu_barrier.cpp.o
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
...
[ 25%] Building CXX object src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/passes/lower.cpp.o
-- Looking for snprintf - found
-- Looking for get_static_proc_name in unwind
-- Looking for get_static_proc_name in unwind - not found
-- Looking for UnDecorateSymbolName in dbghelp
-- Looking for UnDecorateSymbolName in dbghelp - not found
-- Performing Test HAVE___ATTRIBUTE__
-- Performing Test HAVE___ATTRIBUTE__ - Success
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_DEFAULT
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_DEFAULT - Success
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_HIDDEN
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_HIDDEN - Success
-- Performing Test HAVE___BUILTIN_EXPECT
-- Performing Test HAVE___BUILTIN_EXPECT - Success
-- Performing Test HAVE___SYNC_VAL_COMPARE_AND_SWAP
-- Performing Test HAVE___SYNC_VAL_COMPARE_AND_SWAP - Success
-- Performing Test HAVE_RWLOCK
-- Performing Test HAVE_RWLOCK - Failed
-- Performing Test HAVE___DECLSPEC
-- Performing Test HAVE___DECLSPEC - Failed
-- Performing Test STL_NO_NAMESPACE
-- Performing Test STL_NO_NAMESPACE - Failed

但是也能正常编译出wheel包。 安装完wheel包后 ,

wzy@gxnzx119:~/PaddleCustomDevice/backends/mlu$ python3 -m pip install build/dist/paddle_custom_mlu-0.0.0-cp38-cp38-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
Processing ./build/dist/paddle_custom_mlu-0.0.0-cp38-cp38-linux_x86_64.whl
paddle-custom-mlu is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
WARNING: Error parsing dependencies of distro-info: Invalid version: '0.23ubuntu1'
WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.36ubuntu1'

在执行之前同样验证过的程序时,出现Segmentation fault。 打印栈帧,如下:

Segmentation fault (core dumped)
wzy@gxnzx119:~/paddle_tests/models$ lldb python3
(lldb) target create "python3"
Current executable set to 'python3' (x86_64).
(lldb) run benchmark_ano.py 
Process 3134839 launched: '/usr/bin/python3' (x86_64)
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
warning: (x86_64) /home/wzy/.local/lib/python3.8/site-packages/numpy.libs/libgfortran-040039e1.so.5.0.0 No LZMA support found for reading .gnu_debugdata section
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
warning: (x86_64) /home/wzy/.local/lib/python3.8/site-packages/pillow.libs/libXau-00ec42fe.so.6.0.0 No LZMA support found for reading .gnu_debugdata section
I0703 02:51:22.082170 3134839 init.cc:234] ENV [CUSTOM_DEVICE_ROOT]=/home/wzy/.local/lib/python3.8/site-packages/paddle_custom_device
I0703 02:51:22.082192 3134839 init.cc:143] Try loading custom device libs from: [/home/wzy/.local/lib/python3.8/site-packages/paddle_custom_device]
Process 3134839 stopped
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0xf00000001)
    frame #0: 0x0000000f00000001
error: memory read failed for 0xf00000000
(lldb) bt
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0xf00000001)
  * frame #0: 0x0000000f00000001
    frame #1: 0x00007fffe3554c6e libphi.so`phi::CustomKernelMap::RegisterCustomKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, phi::KernelKey const&, phi::Kernel const&) + 622
    frame #2: 0x00007fffaebdc83e libpaddle-custom-mlu.so`phi::KernelRegistrar::ConstructKernel(phi::RegType, char const*, char const*, common::DataLayout, phi::DataType, void (*)(phi::KernelKey const&, phi::KernelArgsDef*), void (*)(phi::KernelKey const&, phi::Kernel*), std::function<void (phi::KernelContext*)>, void*) (.constprop.371) + 2222
    frame #3: 0x00007fffaebdcdfe libpaddle-custom-mlu.so`phi::KernelRegistrar::KernelRegistrar(phi::RegType, char const*, char const*, common::DataLayout, phi::DataType, void (*)(phi::KernelKey const&, phi::KernelArgsDef*), void (*)(phi::KernelKey const&, phi::Kernel*), std::function<void (phi::KernelContext*)>, void*) + 158
    frame #4: 0x00007fffaeb3430e libpaddle-custom-mlu.so`__static_initialization_and_destruction_0(int, int) (.constprop.355) + 4062

请问如何解决呢?

YanhuiDua commented 6 days ago

看上去是第三方依赖哭pthread的问题,建议使用官方提供的镜像:docker pull registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310,在这个镜像里安装py38的环境进行编译

也可以参考这个dockerfile自己产出py38的镜像:

paddle-mlu的dockerfile : https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/mlu/tools/dockerfile/Dockerfile.mlu.kylinv10.gcc82.py310

paddle-cpu的dockerfile: https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/custom_cpu/tools/dockerfile/Dockerfile.ubuntu20.x86_64.gcc84

wangzy0327 commented 5 days ago

重新尝试在registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310镜像里安装了py38的环境进行编译,发现与主机端编译时报错一样。是否是由于paddlecustomdevice版本问题导致的编译不通过呢?如果是paddlecustomdevice版本的问题,请问正常执行的paddlecustomdevice版本是哪个? @YanhuiDua

qili93 commented 5 days ago

重新尝试在registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310镜像里安装了py38的环境进行编译,发现与主机端编译时报错一样。是否是由于paddlecustomdevice版本问题导致的编译不通过呢?如果是paddlecustomdevice版本的问题,请问正常执行的paddlecustomdevice版本是哪个? @YanhuiDua

根据这个报错,你编译的包应该是可以的,需要通过 --force-reinstall 命令重新安装下

paddle-custom-mlu is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel. WARNING: Error parsing dependencies of distro-info: Invalid version: '0.23ubuntu1' WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.36ubuntu1'