PaddlePaddle / PaddleCustomDevice

PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
Apache License 2.0
70 stars 147 forks source link

昇腾300I DUO卡不支持 #1356

Open tomjimi2019 opened 3 months ago

tomjimi2019 commented 3 months ago

[root@localhost dist]# python3 -c "import paddle; paddle.utils.run_check()" I0722 12:18:07.275084 1318483 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/python310/lib/python3.10/site-packages/paddle_custom_device I0722 12:18:07.275131 1318483 init.cc:145] Try loading custom device libs from: [/usr/local/python310/lib/python3.10/site-packages/paddle_custom_device] I0722 12:18:07.746173 1318483 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/python310/lib/python3.10/site-packages/paddle_custom_device/libpaddle-custom-npu.so I0722 12:18:07.749110 1318483 custom_kernel.cc:63] Succeed in loading 355 custom kernel(s) from loaded lib(s), will be used like native ones. I0722 12:18:07.749258 1318483 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/python310/lib/python3.10/site-packages/paddle_custom_device] I0722 12:18:07.749292 1318483 init.cc:242] CustomDevice: npu, visible devices count: 4 Running verify PaddlePaddle program ... I0722 12:18:08.663375 1318483 program_interpreter.cc:243] New Executor is Running.


C++ Traceback (most recent call last):

0 paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator > const&, bool) 1 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator > const&, bool, bool, bool, bool) 2 paddle::framework::ProgramInterpreter::Run(std::vector<std::string, std::allocator > const&, bool, bool, bool, bool) 3 paddle::framework::ProgramInterpreter::Build(std::vector<std::string, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocator >, bool) 4 paddle::framework::interpreter::BuildOpFuncList(phi::Place const&, paddle::framework::BlockDesc const&, std::set<std::string, std::less, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocator >, paddle::framework::VariableScope, paddle::framework::interpreter::ExecutionConfig const&, std::vector<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)>, std::allocator<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)> > > const&, std::vector<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)>, std::allocator<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)> > > const&, bool, bool) 5 void custom_kernel::MatmulKernel<float, phi::CustomContext>(phi::CustomContext const&, phi::DenseTensor const&, phi::DenseTensor const&, bool, bool, phi::DenseTensor) 6 aclnnMatmul 7 InitL2Phase2Context(char, aclOpExecutor) 8 GetOpExecCacheFromExecutor(aclOpExecutor*)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1721621889 (unix time) try "date -d @1721621889" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x141e53) received by PID 1318483 (TID 0xfffef55e5980) from PID 1318483 ]

段错误 (核心已转储)

cleansely commented 3 months ago

同问,是否能基于300i duo进行推理,安装对应cann和kernels是否就行

tomjimi2019 commented 3 months ago

试了将paddlepaddle和paddlecustomdevice降级到2.6.0可以不报错,但paddlex 3.0.0-beta只支持paddle3.0版本