PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.17k stars 5.57k forks source link

昇腾910b上编译安装aarch版本的paddle包之后,运行检查程序报错,错误代码 507033 #68865

Open boss-yang702 opened 6 hours ago

boss-yang702 commented 6 hours ago

问题描述 Issue Description

λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle; paddle.utils.run_check()"
I1022 11:56:13.547047  6141 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 11:56:13.547103  6141 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 11:56:14.248243  6141 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 11:56:14.256450  6141 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 11:56:14.256623  6141 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 11:56:14.256670  6141 init.cc:242] CustomDevice: npu, visible devices count: 1
Running verify PaddlePaddle program ... 
Call aclrtSetDevice(device->id) failed : 507033 at file /work/PaddleCustomDevice/backends/npu/runtime/runtime.cc line 430
EL0002: 2024-10-22-11:56:14.885.664 The device ID is invalid.
        TraceBack (most recent call last):
        Failed to open device, retCode=0x7020013, deviceId=0.[FUNC:InitRawDriver][FILE:device.cc][LINE:278]
        Failed to init RawDriver, device_id=0, retCode=0x7020013[FUNC:Init][FILE:device.cc][LINE:322]
        Check param failed, dev can not be NULL![FUNC:DeviceRetain][FILE:runtime.cc][LINE:3709]
        Check param failed, dev can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3447]
        Check param failed, ctx can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3474]
        Check param failed, context can not be null.[FUNC:NewDevice][FILE:api_impl.cc][LINE:2179]
        New device failed, retCode=0x7010006[FUNC:SetDevice][FILE:api_impl.cc][LINE:2201]
        rtSetDevice execute failed, reason=[device retain error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        open device 0 failed, runtime result = 507033.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4692]
        The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

版本&环境信息 Version & Environment Information

## λ master-1 /work/PaddleCustomDevice/backends/npu {develop} pip install build/dist/paddle_custom_npu*.whl
Processing ./build/dist/paddle_custom_npu-0.0.0-cp39-cp39-linux_aarch64.whl
# 运行的是https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/hardware_support/npu/install_cn.html这个里面的,但是在最后检查健康时出错
ronny1996 commented 5 hours ago

你好,看日志是没有卡,使用 npu-smi info 确认下是否有 npu,另外检查下是否设置了环境变量 ASCEND_RT_VISIBLE_DEVICES

boss-yang702 commented 4 hours ago

你好,看日志是没有卡,使用 npu-smi info 确认下是否有 npu,另外检查下是否设置了环境变量 ASCEND_RT_VISIBLE_DEVICES

docker pull registry.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-aarch64-gcc84-py39 # ARM 架构

# 考如下命令启动容器,ASCEND_RT_VISIBLE_DEVICES 可指定可见的 NPU 卡号
docker run -it --name paddle-npu-dev -v $(pwd):/work \
    --privileged --network=host --shm-size=128G -w=/work \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    registry.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-$(uname -m)-gcc84-py39 /bin/bash
    是这样进去容器的,运行npu-smi info 可以看到npu信息

    λ master-1 /work npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2                 Version: 24.1.rc2                                             |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B1               | OK            | 91.8        37                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3351 / 65536         |
+===========================+===============+====================================================+
| 1     910B1               | OK            | 89.3        38                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 2     910B1               | OK            | 97.5        37                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 3     910B1               | OK            | 91.9        39                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 4     910B1               | OK            | 90.9        40                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3336 / 65536         |
+===========================+===============+====================================================+
| 5     910B1               | OK            | 89.4        40                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          19857/ 65536         |
+===========================+===============+====================================================+
| 6     910B1               | OK            | 91.0        38                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3332 / 65536         |
+===========================+===============+====================================================+
| 7     910B1               | OK            | 92.3        38                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3332 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| 5       0                 | 172908        |                          | 16443                   |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

执行以下命令后
# 下载 PaddleCustomDevice 源码
git clone https://github.com/PaddlePaddle/PaddleCustomDevice

# 进入硬件后端(昇腾 NPU)目录
cd PaddleCustomDevice/backends/npu

# 先安装飞桨 CPU 安装包
pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu

# 执行编译脚本 - submodule 在编译时会按需下载
bash tools/compile.sh

# 飞桨 NPU 插件包在 build/dist 路径下,使用 pip 安装即可
pip install build/dist/paddle_custom_npu*.whl 

再执行# 飞桨基础健康检查
python -c "import paddle; paddle.utils.run_check()"
便507033错误
boss-yang702 commented 4 hours ago

λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle_custom_device; paddle_custom_device.npu.version()" I1022 13:56:16.133432 1028 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device I1022 13:56:16.133492 1028 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device] I1022 13:56:16.795500 1028 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so I1022 13:56:16.803839 1028 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones. I1022 13:56:16.804047 1028 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device] I1022 13:56:16.804088 1028 init.cc:242] CustomDevice: npu, visible devices count: 8 version: 0.0.0 commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10 custom_op commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10 cann: 8.0.RC1

ronny1996 commented 4 hours ago

你好,我查了一下 507033 是驱动错误 image 可以降级下驱动到23.0.3,或者升级下cann

boss-yang702 commented 4 hours ago

你好,我查了一下 507033 是驱动错误 image 可以降级下驱动到23.0.3,或者升级下cann

请问是在容器外物理机上降驱动或升级cann吗, 物理机:cann版本 [root@master-1 arm64-linux]# cat /usr/local/Ascend/ascend-toolkit/latest/arm64-linux/ascend_toolkit_install.info package_name=Ascend-cann-toolkit version=8.0.RC2 innerversion=V100R001C18SPC001B254 compatible_version=[V100R001C15,V100R001C18],[V100R001C30],[V100R001C13],[V100R003C11],[V100R001C29],[V100R001C10] arch=aarch64 os=linux path=/usr/local/Ascend/ascend-toolkit/8.0.RC2/aarch64-linux 物理机:驱动版本 [root@master-1 arm64-linux]# npu-smi info +------------------------------------------------------------------------------------------------+ | npu-smi 24.1.rc2 Version: 24.1.rc2 |

ronny1996 commented 4 hours ago

降级驱动需要在物理机上,升级cann在容器里就可以,可以参考npu的官方文档 https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC3alpha003/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=Ubuntu&Software=cannToolKit 升级完cann需要重新编译下 npu 插件

boss-yang702 commented 1 hour ago

降级驱动需要在物理机上,升级cann在容器里就可以,可以参考npu的官方文档 https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC3alpha003/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=Ubuntu&Software=cannToolKit 升级完cann需要重新编译下 npu 插件

在容器里升级cann到最新版本也不行,请问paddle框架是没有识别最新的版本的驱动吗,可能这个npu驱动paddle没有识别到,驱动版本是24.1rc2

ronny1996 commented 1 hour ago

paddle框架和npu驱动没有关联,只会调用cann提供的api,升级后重新编译npu插件能过应该就没问题,只有cann和npu驱动有关系,可以看下npu的日志

rm -rf ~/ascend/log/debug/plog/
python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
grep -r ERROR ~/ascend/log/debug/plog/
boss-yang702 commented 1 hour ago

paddle框架和npu驱动没有关联,只会调用cann提供的api,升级后重新编译npu插件能过应该就没问题,只有cann和npu驱动有关系,可以看下npu的日志

rm -rf ~/ascend/log/debug/plog/
python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
grep -r ERROR ~/ascend/log/debug/plog/
λ master-1 /work rm -rf ~/ascend/log/debug/plog/
λ master-1 /work python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
I1022 17:26:20.919286 200903 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 17:26:20.919353 200903 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:26:22.105868 200903 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 17:26:22.117818 200903 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 17:26:22.118059 200903 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:26:22.118109 200903 init.cc:242] CustomDevice: npu, visible devices count: 8
version: 0.0.0
commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
custom_op commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
cann: 8.0.RC1
λ master-1 /work grep -r ERROR ~/ascend/log/debug/plog/
grep: /root/ascend/log/debug/plog/: No such file or directory

这是运行之后的输出

在日志文件夹下没有找到

λ master-1 /var/log ls
alternatives.log  apt/  ascend_seclog/  bootstrap.log  btmp  dpkg.log  faillog  journal/  lastlog  mindie_log/  nputools_LOG_ERR.log  nputools_LOG_INFO.log  ntpstats/  private/  unattended-upgrades/  wtmp

进入 ascend_seclog/

λ master-1 /var/log/ascend_seclog ls
ascend_install.log  ascend_kernels_910b_install.log  ascend_toolkit_install.log  operation.log

查看 ascend_toolkit_install.log

[Toolkit] [2024-10-22 16:43:09] [INFO]: rm soft link /usr/local/Ascend/ascend-toolkit/8.0
[Toolkit] [2024-10-22 16:43:09] [INFO]: rm soft link /usr/local/Ascend/ascend-toolkit/8.0
[Toolkit] [2024-10-22 16:43:09] [INFO]:  delete file /usr/local/Ascend/ascend-toolkit/8.0 successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]:  delete file /usr/local/Ascend/ascend-toolkit/8.0 successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]:  delete operation, the directory /usr/local/Ascend/ascend-toolkit is not empty.
[Toolkit] [2024-10-22 16:43:09] [INFO]:  delete operation, the directory /usr/local/Ascend/ascend-toolkit is not empty.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link /usr/local/Ascend/ascend-toolkit/8.0 is created successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link /usr/local/Ascend/ascend-toolkit/8.0 is created successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link is updated successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link is updated successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: toolkit uninstall success
[Toolkit] [2024-10-22 16:43:09] [INFO]: toolkit uninstall success
[Toolkit] [20241022-16:43:09] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/arm64-linux is created successfully.
[Toolkit] [20241022-16:43:14] [INFO]  delete directory /usr/local/Ascend/ascend-toolkit_recover successfully.
[Toolkit] [20241022-16:43:14] [INFO] mkdir /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/script
[Toolkit] [20241022-16:43:14] [INFO] mkdir /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/combo_script
[Toolkit] [20241022-16:43:14] [INFO] touch /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/install.conf
[Toolkit] [20241022-16:43:14] [INFO] touch /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/ascend_toolkit_install.info
[Toolkit] [20241022-16:43:14] [INFO] Write Toolkit_InstallPath to /etc/Ascend/ascend_cann_install.info.
[Toolkit] [20241022-16:43:14] [WARNING]  delete operation, the file /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/script/set_env.sh is not exist.
[Toolkit] [20241022-16:43:14] [INFO]  delete file /usr/local/Ascend/ascend-toolkit/set_env.sh successfully.
[Toolkit] [20241022-16:43:14] [INFO]  delete file /usr/local/Ascend/ascend-toolkit/latest/arm64-linux successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/arm64-linux is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/runtime is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/install.conf is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/script is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/runtime is created successfully.
[Toolkit] [20241022-16:43:14] [INFO]  delete file /usr/local/Ascend/ascend-toolkit/8.0 successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/8.0 is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The permission on the file and folder is granted successfully.
[Toolkit] [20241022-16:43:14] [INFO] process end
boss-yang702 commented 51 minutes ago

升级cann后重新编译安装后,还是报错之前一样的错误

λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
I1022 17:51:24.640590 205203 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 17:51:24.640648 205203 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:25.504863 205203 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 17:51:25.512694 205203 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 17:51:25.512885 205203 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:25.512928 205203 init.cc:242] CustomDevice: npu, visible devices count: 8
version: 0.0.0
commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
custom_op commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
cann: 8.0.RC3
λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle; paddle.utils.run_check()"
I1022 17:51:33.433328 205664 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 17:51:33.433386 205664 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:34.248180 205664 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 17:51:34.255592 205664 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 17:51:34.255789 205664 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:34.255828 205664 init.cc:242] CustomDevice: npu, visible devices count: 8
Running verify PaddlePaddle program ... 
Call aclrtSetDevice(device->id) failed : 507033 at file /work/PaddleCustomDevice/backends/npu/runtime/runtime.cc line 430
EL0002: [PID: 205664] 2024-10-22-17:51:34.972.389 The device ID is invalid.
        TraceBack (most recent call last):
        Failed to open device, retCode=0x7020013, deviceId=0.[FUNC:InitRawDriver][FILE:device.cc][LINE:281]
        Failed to init RawDriver, device_id=0, retCode=0x7020013[FUNC:Init][FILE:device.cc][LINE:324]
        Check param failed, dev can not be NULL![FUNC:DeviceRetain][FILE:runtime.cc][LINE:4040]
        Check param failed, dev can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3736]
        Check param failed, ctx can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3763]
        Check param failed, context can not be null.[FUNC:NewDevice][FILE:api_impl.cc][LINE:2463]
        New device failed, retCode=0x7010006[FUNC:SetDevice][FILE:api_impl.cc][LINE:2486]
        rtSetDevice execute failed, reason=[device retain error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        open device 0 failed, runtime result = 507033.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:5326]
        The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
日志:        
λ master-1 /work/PaddleCustomDevice/backends/npu {develop} grep -r ERROR ~/ascend/log/debug/plog/
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] DRV(205664,python):2024-10-22-17:51:34.972.314 [ascend][curpid: 205664, 205664][drv][tsdrv][share_log_read_in_single_module 634]Unsupported command. (devid=0; cmd=12) Unsupported command. (devid=0; cmd_nr=12)
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] DRV(205664,python):2024-10-22-17:51:34.972.333 [ascend][curpid: 205664, 205664][drv][tsdrv][trsDevInit 168]Sqcq init failed. (devid=0; ret=17)
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.631 [npu_driver.cc:2459]205664 DeviceOpen:[drv api] drvDeviceOpen failed: device_id=0, drvRetCode=2!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.653 [device.cc:281]205664 InitRawDriver:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.657 [device.cc:281]205664 InitRawDriver:Failed to open device, retCode=0x7020013, deviceId=0.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.698 [device.cc:324]205664 Init:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.702 [device.cc:324]205664 Init:Failed to init RawDriver, device_id=0, retCode=0x7020013
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.718 [runtime.cc:4008]205664 DeviceRetain:Failed to init device.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.187 [runtime.cc:4040]205664 DeviceRetain:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.220 [runtime.cc:4040]205664 DeviceRetain:Check param failed, dev can not be NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.319 [runtime.cc:3736]205664 PrimaryContextRetain:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.323 [runtime.cc:3736]205664 PrimaryContextRetain:Check param failed, dev can not be NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.350 [runtime.cc:3763]205664 PrimaryContextRetain:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.353 [runtime.cc:3763]205664 PrimaryContextRetain:Check param failed, ctx can not be NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.389 [api_impl.cc:2463]205664 NewDevice:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.393 [api_impl.cc:2463]205664 NewDevice:Check param failed, context can not be null.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.406 [api_impl.cc:2486]205664 SetDevice:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.409 [api_impl.cc:2486]205664 SetDevice:New device failed, retCode=0x7010006
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.436 [api_error.cc:1716]205664 SetDevice:Set device failed, device_id=0, deviceMode=0.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.492 [api_c_device.cc:58]205664 rtSetDevice:ErrCode=507033, desc=[device retain error], InnerCode=0x7010006
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.502 [error_message_manage.cc:53]205664 FuncErrorReason:report error module_type=3, module_name=EE8888
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.507 [error_message_manage.cc:53]205664 FuncErrorReason:rtSetDevice execute failed, reason=[device retain error]
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] ASCENDCL(205664,python):2024-10-22-17:51:37.933.544 [device.cpp:147]205664 aclrtSetDevice: open device 0 failed, runtime result = 507033.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.676 [api_impl.cc:5326]205664 GetDevErrMsg:report error module_type=3, module_name=EE8888
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.680 [api_impl.cc:5326]205664 GetDevErrMsg:ctx is NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.696 [api_impl.cc:5383]205664 GetDevMsg:Failed to GetDeviceErrMsg, retCode=0x7070001.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.699 [api_error.cc:3207]205664 GetDevMsg:GetDeviceMsg failed, getMsgType=0.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.706 [api_c_device.cc:429]205664 rtGetDevMsg:ErrCode=107002, desc=[context pointer null], InnerCode=0x7070001
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.709 [error_message_manage.cc:48]205664 FuncErrorReason:report error module_name=EE1001
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.713 [error_message_manage.cc:48]205664 FuncErrorReason:rtGetDevMsg execute failed, reason=[context pointer null]