PaddlePaddle / PaddleX

Low-code development tool based on PaddlePaddle(飞桨低代码开发工具)
Apache License 2.0
4.76k stars 936 forks source link

PaddleX 3.0-beta 无法支持昇腾300I DUO卡 #1838

Open tomjimi2019 opened 1 month ago

tomjimi2019 commented 1 month ago

如果安装paddlepaddle==3.0.0b0 和paddlecustomdevice==3.0.0b0,则验证报错 [root@localhost dist]# python3 -c "import paddle; paddle.utils.run_check()" I0722 12:18:07.275084 1318483 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/python310/lib/python3.10/site-packages/paddle_custom_device I0722 12:18:07.275131 1318483 init.cc:145] Try loading custom device libs from: [/usr/local/python310/lib/python3.10/site-packages/paddle_custom_device] I0722 12:18:07.746173 1318483 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/python310/lib/python3.10/site-packages/paddle_custom_device/libpaddle-custom-npu.so I0722 12:18:07.749110 1318483 custom_kernel.cc:63] Succeed in loading 355 custom kernel(s) from loaded lib(s), will be used like native ones. I0722 12:18:07.749258 1318483 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/python310/lib/python3.10/site-packages/paddle_custom_device] I0722 12:18:07.749292 1318483 init.cc:242] CustomDevice: npu, visible devices count: 4 Running verify PaddlePaddle program ... I0722 12:18:08.663375 1318483 program_interpreter.cc:243] New Executor is Running.

C++ Traceback (most recent call last): 0 paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator > const&, bool) 1 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator > const&, bool, bool, bool, bool) 2 paddle::framework::ProgramInterpreter::Run(std::vector<std::string, std::allocator > const&, bool, bool, bool, bool) 3 paddle::framework::ProgramInterpreter::Build(std::vector<std::string, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocatorpaddle::framework::OpFuncNode >, bool) 4 paddle::framework::interpreter::BuildOpFuncList(phi::Place const&, paddle::framework::BlockDesc const&, std::set<std::string, std::less, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocatorpaddle::framework::OpFuncNode >, paddle::framework::VariableScope, paddle::framework::interpreter::ExecutionConfig const&, std::vector<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)>, std::allocator<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)> > > const&, std::vector<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)>, std::allocator<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)> > > const&, bool, bool) 5 void custom_kernel::MatmulKernel<float, phi::CustomContext>(phi::CustomContext const&, phi::DenseTensor const&, phi::DenseTensor const&, bool, bool, phi::DenseTensor) 6 aclnnMatmul 7 InitL2Phase2Context(char, aclOpExecutor) 8 GetOpExecCacheFromExecutor(aclOpExecutor*)

Error Message Summary: FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1721621889 (unix time) try "date -d @1721621889" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x141e53) received by PID 1318483 (TID 0xfffef55e5980) from PID 1318483 ] 段错误 (核心已转储)

如果安装paddlepaddle==2.6.1和paddlecustomdevice==2.6.1,则paddlex无法使用,报检测paddlepaddle版本错误必须为3.0

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号 2.6.1 3.0.0-beta
  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS linux
  3. 请问您使用的Python版本是? 3.10
  4. 请问您使用的CUDA/cuDNN的版本号是? 昇腾平台
nepeplwu commented 1 month ago

@tomjimi2019 感谢反馈,300I DUO是推理卡,基于该卡进行训练是不支持的,基于该卡进行推理部署功能的支持,内部正在评估中