PaddlePaddle / PaddleX

All-in-One Development Tool based on PaddlePaddle(飞桨低代码开发工具)
Apache License 2.0
4.83k stars 950 forks source link

关于昇腾开发板中使用paddlex进行推理 #2311

Open knoka812 opened 16 hours ago

knoka812 commented 16 hours ago

描述问题

尝试在昇腾开发板中使用paddleX进行模型推理,安装校验部分存在问题 参考教程:https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta/docs/tutorials/INSTALL_OTHER_DEVICES.md#11-%E7%8E%AF%E5%A2%83%E5%87%86%E5%A4%87 前两步骤能够正常进行,但是最好安装存在问题 最后校验输出 [ERROR] DRV(88,python):2024-10-23-20:33:04.011.089 [hdc_core.c:396][hdc] [hdcPcieInit 396] HDC init failed, driver may not load.

[ERROR] DRV(88,python):2024-10-23-20:49:44.693.329 [hdc_core.c:379][hdc] [hdc_phandle_get 379] Open pcie device failed. (strerror="No such file or directory") [EVENT] PROFILING(88,python):2024-10-23-20:49:46.127.239 [msprof_callback_impl.cpp:336] >>> (tid:88) Started to register profiling ctrl callback. [EVENT] PROFILING(88,python):2024-10-23-20:49:46.130.073 [msprof_callback_impl.cpp:343] >>> (tid:88) Started to register profiling hash id callback. [INFO] PROFILING(88,python):2024-10-23-20:49:46.130.312 [prof_atls_plugin.cpp:83] >>> (tid:88) RegisterProfileCallback, callback type is 7 [EVENT] PROFILING(88,python):2024-10-23-20:49:46.130.474 [msprof_callback_impl.cpp:350] >>> (tid:88) Started to register profiling enable host freq callback. [INFO] PROFILING(88,python):2024-10-23-20:49:46.130.618 [prof_atls_plugin.cpp:83] >>> (tid:88) RegisterProfileCallback, callback type is 8 [INFO] PROFILING(88,python):2024-10-23-20:49:46.475.566 [prof_atls_plugin.cpp:160] >>> (tid:88) Module[0] register callback of ctrl handle. [INFO] PROFILING(88,python):2024-10-23-20:49:46.539.724 [prof_atls_plugin.cpp:160] >>> (tid:88) Module[48] register callback of ctrl handle. [INFO] PROFILING(88,python):2024-10-23-20:49:46.540.174 [prof_atls_plugin.cpp:160] >>> (tid:88) Module[45] register callback of ctrl handle. [INFO] GE(88,python):2024-10-23-20:49:47.328.835 [op_tiling_manager.cc:102][EVENT]88 ~FuncPerfScope:[GEPERFTRACE] The time cost of OpTilingManager::LoadSo is [788467] micro second. [INFO] PROFILING(88,python):2024-10-23-20:49:47.862.469 [prof_atls_plugin.cpp:160] >>> (tid:88) Module[6] register callback of ctrl handle. [ERROR] DVPP(88,python):2024-10-23-20:49:47.897.976 [vdec_wrapper_dev.cpp:34][WRAP] [LoadFunctions:34] [tid:88] dlopen fail: [libmpi_dvpp_adapter.so: cannot open shared object file: No such file or directory] [ERROR] DVPP(88,python):2024-10-23-20:49:47.906.770 [venc_wrapper_dev.cpp:42][WRAP] [LoadFunctions:42] [tid:88] dlopen fail: [libmpi_dvpp_adapter.so: cannot open shared object file: No such file or directory] [ERROR] DVPP(88,python):2024-10-23-20:49:47.908.876 [pngd_wrapper_dev.cpp:36][WRAP] [LoadFunctions:36] [tid:88] dlopen fail: [libmpi_dvpp_adapter.so: cannot open shared object file: No such file or directory] [ERROR] DVPP(88,python):2024-10-23-20:49:47.909.448 [sys_wrapper_dev.cpp:37][WRAP] [LoadFunctions:37] [tid:88] dlopen fail: [libmpi_dvpp_adapter.so: cannot open shared object file: No such file or directory] [ERROR] DVPP(88,python):2024-10-23-20:49:47.909.963 [region_wrapper_dev.cpp:46][WRAP] [LoadFunctions:46] [tid:88] dlopen fail: [libmpi_dvpp_adapter.so: cannot open shared object file: No such file or directory] [INFO] RUNTIME(88,python):2024-10-23-20:49:49.873.332 [runtime.cc:5211] 88 GetVisibleDevices: real deviceCnt:1 userDeviceCnt:1 isSetVisibleDev:1 ASCEND_RT_VISIBLE_DEVICES:0 [INFO] PROFILING(88,python):2024-10-23-20:49:49.952.341 [prof_atls_plugin.cpp:160] >>> (tid:88) Module[7] register callback of ctrl handle. [EVENT] PROFILING(88,python):2024-10-23-20:49:49.958.361 [msprof_callback_impl.cpp:89] >>> (tid:88) MsprofCtrlCallback called, type: 255 [EVENT] PROFILING(88,python):2024-10-23-20:49:49.959.026 [ai_drv_dev_api.cpp:333] >>> (tid:88) Succeeded to DrvGetApiVersion version: 0x71f0d I1023 20:49:49.965087 88 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so I1023 20:49:49.976312 88 custom_kernel.cc:63] Succeed in loading 355 custom kernel(s) from loaded lib(s), will be used like native ones. I1023 20:49:49.976837 88 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device] I1023 20:49:49.976926 88 init.cc:242] CustomDevice: npu, visible devices count: 1 Running verify PaddlePaddle program ... [INFO] TDT(88,python):2024-10-23-20:49:52.071.199 [client_manager.cpp:426][GetClientRunMode][tid:88] runningMode:0 [INFO] TDT(88,python):2024-10-23-20:49:52.071.457 [client_manager.cpp:104][GetInstance][tid:88] curmode:2 [INFO] TDT(88,python):2024-10-23-20:49:52.071.630 [thread_mode_manager.cpp:69][Open][tid:88] [ThreadModeManager] enter into open process deviceId[0] rankSize[0] [INFO] TDT(88,python):2024-10-23-20:49:52.073.165 [thread_mode_manager.cpp:280][HandleAICPUPackage][tid:88] begin load aicpu package dstPath[/root/], srcpath[/usr/local/Ascend/ascend-toolkit/latest/opp/Ascend/aicpu/] file[Ascend-aicpu_syskernels.tar.gz] [INFO] TDT(88,python):2024-10-23-20:49:52.073.471 [package_worker.cpp:346][LoadAICPUPackageForThreadMode][tid:88] Package checkcode is [52523362] [INFO] TDT(88,python):2024-10-23-20:49:52.073.881 [package_worker.cpp:359][LoadAICPUPackageForThreadMode][tid:88] aicpu_package_install.info create success verifileFile[/root/aicpu_package_install.info] [INFO] TDT(88,python):2024-10-23-20:50:02.786.287 [package_worker.cpp:754][GetPackageName][tid:88] PackageInnerFile is [/root/aicpu_kernels/0/aicpu_kernels_device/version.info], srcPackageName is [Ascend-aicpu_syskernels.tar.gz] [INFO] TDT(88,python):2024-10-23-20:50:02.787.084 [package_worker.cpp:737][MoveSoToSandBox][tid:88] Rename file [/root/aicpu_kernels/0/aicpu_kernels_device/libtensorflow.so] to [/root/aicpu_kernels/0/aicpu_kernels_device/sand_box/libtensorflow.so] success. [INFO] TDT(88,python):2024-10-23-20:50:02.812.365 [package_worker.cpp:407][RemoveFile][tid:88] Remove file: [/root/Ascend-aicpu_syskernels.tar.gz] success [INFO] TDT(88,python):2024-10-23-20:50:02.813.116 [thread_mode_manager.cpp:280][HandleAICPUPackage][tid:88] begin load aicpu package dstPath[/root/], srcpath[/usr/local/Ascend/ascend-toolkit/latest/opp/Ascend/aicpu/] file[Ascend-aicpu_extend_syskernels.tar.gz] [INFO] TDT(88,python):2024-10-23-20:50:02.813.447 [package_worker.cpp:346][LoadAICPUPackageForThreadMode][tid:88] Package checkcode is [1324690] [INFO] TDT(88,python):2024-10-23-20:50:02.813.835 [package_worker.cpp:359][LoadAICPUPackageForThreadMode][tid:88] aicpu_package_install.info create success verifileFile[/root/extend_aicpu_package_install.info] [INFO] TDT(88,python):2024-10-23-20:50:03.756.769 [package_worker.cpp:1472][CpyExtendSoToCommonSoPath][tid:88] cmd:mkdir -p /root/aicpu_kernels/0/aicpu_kernels_device/ && cp /root/aicpu_kernels/0/aicpu_extend_syskernels/libaicpu_extend_kernels.so /root/aicpu_kernels/0/aicpu_kernels_device/ && rm -rf /root/aicpu_kernels/0/aicpu_extend_syskernels/ excute success [INFO] TDT(88,python):2024-10-23-20:50:03.758.017 [package_worker.cpp:407][RemoveFile][tid:88] Remove file: [/root/Ascend-aicpu_extend_syskernels.tar.gz] success [WARNING] TDT(88,python):2024-10-23-20:50:03.868.399 [thread_mode_manager.cpp:114][StartCallAICPU][tid:88] [ThreadModeManager] Can not open libaicpu_scheduler.so, deviceId[0], reason[/usr/lib64/libaicpu_scheduler.so: cannot open shared object file: No such file or directory] [ERROR] TDT(88,python):2024-10-23-20:50:03.869.408 [thread_mode_manager.cpp:120][StartCallAICPU][tid:88] [ThreadModeManager] failed open libaicpu_scheduler.so, deviceId[0], reason[libaicpu_scheduler.so: cannot open shared object file: No such file or directory] [ERROR] TDT(88,python):2024-10-23-20:50:03.869.624 [thread_mode_manager.cpp:80][Open][tid:88] [ThreadModeManager] failed call aicpu. [ERROR] TDT(88,python):2024-10-23-20:50:03.869.777 [tsd_client.cpp:33][TsdOpen][tid:88] TsdOpen failed, deviceId[0]. [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.869.945 [runtime.cc:3008]88 PrintfTsdError:report error module_type=0, module_name=E39999 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.870.092 [runtime.cc:3008]88 PrintfTsdError:TsdOpen failed. devId=0, tdt error=31 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.870.359 [runtime.cc:3718]88 DeviceRetain:report error module_type=0, module_name=EE9999 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.870.515 [runtime.cc:3718]88 DeviceRetain:Start aicpu executor failed, retCode=0x7020009 devId=0 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.870.715 [runtime.cc:3504]88 PrimaryContextRetain:report error module_type=0, module_name=EE9999 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.870.868 [runtime.cc:3504]88 PrimaryContextRetain:Check param failed, dev can not be NULL! [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.871.046 [runtime.cc:3531]88 PrimaryContextRetain:report error module_type=0, module_name=EE9999 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.871.196 [runtime.cc:3531]88 PrimaryContextRetain:Check param failed, ctx can not be NULL! [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.871.374 [api_impl.cc:2257]88 NewDevice:report error module_type=0, module_name=EE9999 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.871.520 [api_impl.cc:2257]88 NewDevice:Check param failed, context can not be null. [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.871.705 [api_impl.cc:2279]88 SetDevice:report error module_type=0, module_name=EE9999 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.871.857 [api_impl.cc:2279]88 SetDevice:New device failed, retCode=0x7010006 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.872.057 [logger.cc:833]88 SetDevice:Set device failed, device_id=0, deviceMode=0. [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.872.239 [api_c_device.cc:52]88 rtSetDevice:ErrCode=507033, desc=[device retain error], InnerCode=0x7010006 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.872.384 [error_message_manage.cc:53]88 FuncErrorReason:report error module_type=3, module_name=EE8888 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.872.531 [error_message_manage.cc:53]88 FuncErrorReason:rtSetDevice execute failed, reason=[device retain error] [ERROR] ASCENDCL(88,python):2024-10-23-20:50:03.872.747 [device.cpp:175]88 aclrtSetDevice: open device 0 failed, runtime result = 507033. Call aclrtSetDevice(device->id) failed : 507033 at file /paddle/backends/npu/runtime/runtime.cc line 430 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.873.237 [api_impl.cc:4870]88 GetDevErrMsg:report error module_type=3, module_name=EE8888 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.873.415 [api_impl.cc:4870]88 GetDevErrMsg:ctx is NULL! [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.873.612 [api_impl.cc:4926]88 GetDevMsg:Failed to GetDeviceErrMsg, retCode=0x7070001. [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.873.780 [logger.cc:1560]88 GetDevMsg:GetDeviceMsg failed, getMsgType=0. [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.873.935 [api_c_device.cc:423]88 rtGetDevMsg:ErrCode=107002, desc=[context pointer null], InnerCode=0x7070001 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.874.077 [error_message_manage.cc:48]88 FuncErrorReason:report error module_name=EE1001 [ERROR] RUNTIME(88,python):2024-10-23-20:50:03.874.219 [error_message_manage.cc:48]88 FuncErrorReason:rtGetDevMsg execute failed, reason=[context pointer null] EE1001: 2024-10-23-20:50:03.874.399 The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null] Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): TsdOpen failed. devId=0, tdt error=31[FUNC:PrintfTsdError][FILE:runtime.cc][LINE:3008] Start aicpu executor failed, retCode=0x7020009 devId=0[FUNC:DeviceRetain][FILE:runtime.cc][LINE:3718] Check param failed, dev can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3504] Check param failed, ctx can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3531] Check param failed, context can not be null.[FUNC:NewDevice][FILE:api_impl.cc][LINE:2257] New device failed, retCode=0x7010006[FUNC:SetDevice][FILE:api_impl.cc][LINE:2279] rtSetDevice execute failed, reason=[device retain error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 0 failed, runtime result = 507033.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4870] The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

[EVENT] PROFILING(88,python):2024-10-23-20:50:03.935.478 [msprof_callback_impl.cpp:89] >>> (tid:88) MsprofCtrlCallback called, type: 3 [EVENT] PROFILING(88,python):2024-10-23-20:50:03.936.192 [ai_drv_dev_api.cpp:333] >>> (tid:88) Succeeded to DrvGetApiVersion version: 0x71f0d [INFO] GE(88,python):2024-10-23-20:50:03.936.655 [execution_runtime.cc:74][EVENT]88 FinalizeExecutionRuntime:Execution runtime finalize begin. [INFO] GE(88,python):2024-10-23-20:50:03.936.861 [execution_runtime.cc:86][EVENT]88 FinalizeExecutionRuntime:Execution runtime finalized. [INFO] IDEDD(88,python):2024-10-23-20:50:04.142.339 [adx_server_manager.cpp:49][tid:88]>>> start to deconstruct adx server manager [INFO] IDEDD(88,python):2024-10-23-20:50:04.189.998 [adx_server_manager.cpp:49][tid:88]>>> start to deconstruct adx server manager [INFO] RUNTIME(88,python):2024-10-23-20:50:04.234.355 [runtime.cc:1873] 88 ~Runtime: deconstruct runtime [INFO] RUNTIME(88,python):2024-10-23-20:50:04.250.198 [runtime.cc:1880] 88 ~Runtime: wait monitor success, use=0.

疑问

目前是否支持昇腾开发板,310B卡环境使用?

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号 教程当中版本
  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS 镜像包含OS、NPU驱动固件、CANN、MndXSDK、图形桌面、远程桌面、USB Audio. 0S版本:Ubuntu22.04 LTS Arm64 固件与驱动版本:23.0.RC3 CANN版本:7.0.RC1 64bit aarch64
  3. 请问您使用的Python版本是? 3.9版本
Bobholamovic commented 3 hours ago

你好,目前我们暂时不支持晟腾310B。相关功能正在开发中,敬请期待!