PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.25k stars 5.59k forks source link

源码编译安装paddlepaddle报错 #50552

Open xman1991 opened 1 year ago

xman1991 commented 1 year ago

问题描述 Issue Description

软件环境: kylinv10 、cuda11.2、cudnn8.1; 硬件环境: cpu:phytium GPU:nvidia GTX1060; 安装步骤: 1、git clone https://github.com/PaddlePaddle/Paddle.git 2、cd paddlepaddle 3、cd python && pip install -r requirement.txt 4、mkdir build && cd build 5、ulimit -n 4096 6、cmake .. -DPY_VERSION=3.7 -DPYTHON_EXECUTABLE=which python3 -DWITH_ARM=ON - DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DON_INFER=ON -DWITH_XBYAK=OFF -DWITH_CUDNN_DSO=ON 7、出现:/home/kylin/2230/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory,问题; 解决方案: sudo cp /usr/local/cuda-10.0/lib64/libcudnn.so.7 /usr/local/lib/libcudnn.so.7 && sudo ldconfig [https://blog.csdn.net/martinkeith/article/details/102997059] 8、cd python/dist 9、pip install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_aarch64.whl 测试方案: 进入python: import paddle paddle.utils.run_check()

报错: Running verify PaddlePaddle program ... I0215 15:09:51.850409 3916 interpretercore.cc:279] New Executor is Running. W0215 15:09:51.850878 3916 gpu_resources.cc:85] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.2, Runtime API Version: 11.2


C++ Traceback (most recent call last):

0 paddle::framework::StandaloneExecutor::Run(paddle::framework::Scope, std::vector<std::string, std::allocator > const&, std::vector<std::string, std::allocator > const&) 1 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator > const&, bool) 2 paddle::framework::interpreter::BuildOpFuncList(phi::Place const&, paddle::framework::BlockDesc const&, std::set<std::string, std::less, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocator >, paddle::framework::VariableScope, paddle::framework::interpreter::ExecutionConfig const&, bool) 3 paddle::platform::DeviceContextPool::Get(phi::Place const&) 4 std::__future_base::_Deferred_state<std::thread::_Invoker<std::tuple<std::unique_ptr<phi::DeviceContext, std::default_delete > ()(phi::Place const&, bool, int), phi::Place, bool, int> >, std::unique_ptr<phi::DeviceContext, std::default_delete > >::_M_complete_async() 5 std::future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> ()>, bool) 6 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result<std::unique_ptr<phi::DeviceContext, std::default_delete > >, std::future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::unique_ptr<phi::DeviceContext, std::default_delete > ()(phi::Place const&, bool, int), phi::Place, bool, int> >, std::unique_ptr<phi::DeviceContext, std::default_delete > > >::_M_invoke(std::_Any_data const&) 7 std::unique_ptr<phi::DeviceContext, std::default_delete > paddle::platform::CreateDeviceContext(phi::Place const&, bool, int) 8 std::enable_if<std::is_same<phi::GPUContext, phi::GPUContext>::value, phi::GPUContext>::type paddle::platform::ConstructDevCtx(phi::Place const&, int) 9 phi::GPUContext::GPUContext(phi::GPUPlace const&, bool, int) 10 phi::InitGpuProperties(phi::Place, int, int, int, int, int, int, std::array<int, 3ul>*) 11 cudnnGetVersion


Error Message Summary:

FatalError: Access to an undefined portion of a memory object is detected by the operating system. [TimeInfo: Aborted at 1676444991 (unix time) try "date -d @1676444991" if you are using GNU date ] [SignalInfo: SIGBUS (@0x7f57890858) received by PID 3916 (TID 0x7fb7515750) from PID 1468598360 ]

总线错误

版本&环境信息 Version & Environment Information


Paddle version: 0.0.0 Paddle With CUDA: True

OS: kylin v10 GCC version: (Ubuntu 9.3.0-10kylin2) 9.3.0 Clang version: 10.0.0-4kylin1 CMake version: version 3.26.0-rc2 Libc version: glibc 2.17 Python version: 3.7.12

CUDA version: 11.0.194 Build cuda_11.0_bu.TC445_37.28540450_0 cuDNN version: N/A Nvidia driver version: 460.84 Nvidia driver List: GPU 0: GeForce GTX 1060 6GB


paddle-bot[bot] commented 1 year ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

xman1991 commented 1 year ago

项目紧急,希望尽快得到回复。

xman1991 commented 1 year ago

项目紧急,希望能尽快得到回复。

LiuChiachi commented 1 year ago

cudnn是不是还是没有安装好呢

xman1991 commented 1 year ago

cudnn是不是还是没有安装好呢

装好了,但是用你们的脚本看不到。

LiuChiachi commented 1 year ago

看看是不是环境变量的设置呢

xman1991 commented 1 year ago

GITHUB_01

xman1991 commented 1 year ago

我修改了你们的脚本,现在可以读出cudnn的version了。 GITHUB_02

xman1991 commented 1 year ago

看看是不是环境变量的设置呢

看看是不是环境变量的设置呢

你好,方便回复下么?

xman1991 commented 1 year ago

看看是不是环境变量的设置呢

你好,项目紧急,希望尽快得到回复哈,谢谢了

LiuChiachi commented 1 year ago

那现在编译还有错误吗

xman1991 commented 1 year ago

那现在编译还有错误吗

编译没错误,生成了python库和c++库,通过pip安装python库后,import paddle 再paddle.utils.run_check(),会报这个issue中提的错误。

xman1991 commented 1 year ago

那现在编译还有错误吗

你好?

xman1991 commented 1 year ago

那现在编译还有错误吗

方便加下微信么?这样沟通有点慢哈。