PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.94k stars 5.52k forks source link

编译paddle+xpu+lite时内存消耗过大 #58460

Open yzhw9981 opened 9 months ago

yzhw9981 commented 9 months ago

bug描述 Describe the Bug

develop分支,编译参数

cmake .. -DPY_VERSION=3.7 \ -DPYTHON_EXECUTABLE=/home/work/disk1/workspace/paddle-slim/Paddle/build/python37-gcc820/bin/python \ -DCMAKE_BUILD_TYPE=Release \ -DWITH_GPU=OFF \ -DWITH_XPU=OFF \ -DON_INFER=ON \ -DWITH_PYTHON=OFF \ -DWITH_AVX=ON \ -DWITH_MKL=ON \ -DWITH_MKLDNN=ON \ -DWITH_XPU_BKCL=OFF \ -DWITH_DISTRIBUTE=OFF \ -DWITH_NCCL=OFF \ -DWITH_LITE=ON \ -DLITE_WITH_XPU=ON \ -DWITH_BDCENTOS=ON \ -DLITE_GIT_TAG=develop \ -DCMAKE_CXX_FLAGS="-Wno-error -w"

之后即使用make -j1,在编译到lite相关代码的时候内存迅速增加(>64G),top看满屏的cc1plus。 且没有途径调整编译并发进程数

其他补充信息 Additional Supplementary Information

No response

Sunting78 commented 9 months ago

请给出具体环境信息

hong19860320 commented 9 months ago

可以手动修改 强制设置为 j1 https://github.com/PaddlePaddle/Paddle/blob/75afa68aca4ccef98e1c5edeb3f3b99c9f49efc3/cmake/external/lite.cmake#L139

建议参考 Paddle+XPU 的文档 https://www.paddlepaddle.org.cn/inference/v2.5/guides/hardware_support/xpu_kunlun_cn.html ,可以不用编译 lite。

ZhangYuef commented 3 weeks ago

可以手动修改 强制设置为 j1

https://github.com/PaddlePaddle/Paddle/blob/75afa68aca4ccef98e1c5edeb3f3b99c9f49efc3/cmake/external/lite.cmake#L139

建议参考 Paddle+XPU 的文档 https://www.paddlepaddle.org.cn/inference/v2.5/guides/hardware_support/xpu_kunlun_cn.html ,可以不用编译 lite。

@hong19860320 请问具体怎么设置不在编译 PaddlePaddle 的时候编译 PaddleLite 呢?没有在文档编译选项中看到 DWITH_LITE

参考 make 选项文档

   -j [jobs], --jobs[=jobs]
       Specifies the number of jobs (commands) to run simultaneously.  If there is more than one -j  option,  the
      last  one  is effective.  If the -j option is given without an argument, make will not limit the number of
       jobs that can run simultaneously. When make invokes a sub-make, all instances of make will  coordinate  to
      run the specified number of jobs at a time; see the section PARALLEL MAKE AND THE JOBSERVER for details.

应该设为 make -j1 或者 make -j {nproc}