Open boss-yang702 opened 6 hours ago
你好,看日志是没有卡,使用 npu-smi info 确认下是否有 npu,另外检查下是否设置了环境变量 ASCEND_RT_VISIBLE_DEVICES
你好,看日志是没有卡,使用 npu-smi info 确认下是否有 npu,另外检查下是否设置了环境变量 ASCEND_RT_VISIBLE_DEVICES
docker pull registry.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-aarch64-gcc84-py39 # ARM 架构
# 考如下命令启动容器,ASCEND_RT_VISIBLE_DEVICES 可指定可见的 NPU 卡号
docker run -it --name paddle-npu-dev -v $(pwd):/work \
--privileged --network=host --shm-size=128G -w=/work \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
registry.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-$(uname -m)-gcc84-py39 /bin/bash
是这样进去容器的,运行npu-smi info 可以看到npu信息
λ master-1 /work npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2 Version: 24.1.rc2 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B1 | OK | 91.8 37 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 3351 / 65536 |
+===========================+===============+====================================================+
| 1 910B1 | OK | 89.3 38 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 3334 / 65536 |
+===========================+===============+====================================================+
| 2 910B1 | OK | 97.5 37 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 3334 / 65536 |
+===========================+===============+====================================================+
| 3 910B1 | OK | 91.9 39 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 3334 / 65536 |
+===========================+===============+====================================================+
| 4 910B1 | OK | 90.9 40 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3336 / 65536 |
+===========================+===============+====================================================+
| 5 910B1 | OK | 89.4 40 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 19857/ 65536 |
+===========================+===============+====================================================+
| 6 910B1 | OK | 91.0 38 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 3332 / 65536 |
+===========================+===============+====================================================+
| 7 910B1 | OK | 92.3 38 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 3332 / 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| No running processes found in NPU 4 |
+===========================+===============+====================================================+
| 5 0 | 172908 | | 16443 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+
执行以下命令后
# 下载 PaddleCustomDevice 源码
git clone https://github.com/PaddlePaddle/PaddleCustomDevice
# 进入硬件后端(昇腾 NPU)目录
cd PaddleCustomDevice/backends/npu
# 先安装飞桨 CPU 安装包
pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
# 执行编译脚本 - submodule 在编译时会按需下载
bash tools/compile.sh
# 飞桨 NPU 插件包在 build/dist 路径下,使用 pip 安装即可
pip install build/dist/paddle_custom_npu*.whl
再执行# 飞桨基础健康检查
python -c "import paddle; paddle.utils.run_check()"
便507033错误
λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle_custom_device; paddle_custom_device.npu.version()" I1022 13:56:16.133432 1028 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device I1022 13:56:16.133492 1028 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device] I1022 13:56:16.795500 1028 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so I1022 13:56:16.803839 1028 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones. I1022 13:56:16.804047 1028 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device] I1022 13:56:16.804088 1028 init.cc:242] CustomDevice: npu, visible devices count: 8 version: 0.0.0 commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10 custom_op commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10 cann: 8.0.RC1
你好,我查了一下 507033 是驱动错误 可以降级下驱动到23.0.3,或者升级下cann
你好,我查了一下 507033 是驱动错误 可以降级下驱动到23.0.3,或者升级下cann
请问是在容器外物理机上降驱动或升级cann吗, 物理机:cann版本 [root@master-1 arm64-linux]# cat /usr/local/Ascend/ascend-toolkit/latest/arm64-linux/ascend_toolkit_install.info package_name=Ascend-cann-toolkit version=8.0.RC2 innerversion=V100R001C18SPC001B254 compatible_version=[V100R001C15,V100R001C18],[V100R001C30],[V100R001C13],[V100R003C11],[V100R001C29],[V100R001C10] arch=aarch64 os=linux path=/usr/local/Ascend/ascend-toolkit/8.0.RC2/aarch64-linux 物理机:驱动版本 [root@master-1 arm64-linux]# npu-smi info +------------------------------------------------------------------------------------------------+ | npu-smi 24.1.rc2 Version: 24.1.rc2 |
降级驱动需要在物理机上,升级cann在容器里就可以,可以参考npu的官方文档 https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC3alpha003/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=Ubuntu&Software=cannToolKit 升级完cann需要重新编译下 npu 插件
降级驱动需要在物理机上,升级cann在容器里就可以,可以参考npu的官方文档 https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC3alpha003/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=Ubuntu&Software=cannToolKit 升级完cann需要重新编译下 npu 插件
在容器里升级cann到最新版本也不行,请问paddle框架是没有识别最新的版本的驱动吗,可能这个npu驱动paddle没有识别到,驱动版本是24.1rc2
paddle框架和npu驱动没有关联,只会调用cann提供的api,升级后重新编译npu插件能过应该就没问题,只有cann和npu驱动有关系,可以看下npu的日志
rm -rf ~/ascend/log/debug/plog/
python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
grep -r ERROR ~/ascend/log/debug/plog/
paddle框架和npu驱动没有关联,只会调用cann提供的api,升级后重新编译npu插件能过应该就没问题,只有cann和npu驱动有关系,可以看下npu的日志
rm -rf ~/ascend/log/debug/plog/ python -c "import paddle_custom_device; paddle_custom_device.npu.version()" grep -r ERROR ~/ascend/log/debug/plog/
λ master-1 /work rm -rf ~/ascend/log/debug/plog/
λ master-1 /work python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
I1022 17:26:20.919286 200903 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 17:26:20.919353 200903 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:26:22.105868 200903 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 17:26:22.117818 200903 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 17:26:22.118059 200903 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:26:22.118109 200903 init.cc:242] CustomDevice: npu, visible devices count: 8
version: 0.0.0
commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
custom_op commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
cann: 8.0.RC1
λ master-1 /work grep -r ERROR ~/ascend/log/debug/plog/
grep: /root/ascend/log/debug/plog/: No such file or directory
这是运行之后的输出
在日志文件夹下没有找到
λ master-1 /var/log ls
alternatives.log apt/ ascend_seclog/ bootstrap.log btmp dpkg.log faillog journal/ lastlog mindie_log/ nputools_LOG_ERR.log nputools_LOG_INFO.log ntpstats/ private/ unattended-upgrades/ wtmp
进入 ascend_seclog/
λ master-1 /var/log/ascend_seclog ls
ascend_install.log ascend_kernels_910b_install.log ascend_toolkit_install.log operation.log
查看 ascend_toolkit_install.log
[Toolkit] [2024-10-22 16:43:09] [INFO]: rm soft link /usr/local/Ascend/ascend-toolkit/8.0
[Toolkit] [2024-10-22 16:43:09] [INFO]: rm soft link /usr/local/Ascend/ascend-toolkit/8.0
[Toolkit] [2024-10-22 16:43:09] [INFO]: delete file /usr/local/Ascend/ascend-toolkit/8.0 successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: delete file /usr/local/Ascend/ascend-toolkit/8.0 successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: delete operation, the directory /usr/local/Ascend/ascend-toolkit is not empty.
[Toolkit] [2024-10-22 16:43:09] [INFO]: delete operation, the directory /usr/local/Ascend/ascend-toolkit is not empty.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link /usr/local/Ascend/ascend-toolkit/8.0 is created successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link /usr/local/Ascend/ascend-toolkit/8.0 is created successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link is updated successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: The soft link is updated successfully.
[Toolkit] [2024-10-22 16:43:09] [INFO]: toolkit uninstall success
[Toolkit] [2024-10-22 16:43:09] [INFO]: toolkit uninstall success
[Toolkit] [20241022-16:43:09] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/arm64-linux is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] delete directory /usr/local/Ascend/ascend-toolkit_recover successfully.
[Toolkit] [20241022-16:43:14] [INFO] mkdir /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/script
[Toolkit] [20241022-16:43:14] [INFO] mkdir /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/combo_script
[Toolkit] [20241022-16:43:14] [INFO] touch /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/install.conf
[Toolkit] [20241022-16:43:14] [INFO] touch /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/ascend_toolkit_install.info
[Toolkit] [20241022-16:43:14] [INFO] Write Toolkit_InstallPath to /etc/Ascend/ascend_cann_install.info.
[Toolkit] [20241022-16:43:14] [WARNING] delete operation, the file /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/script/set_env.sh is not exist.
[Toolkit] [20241022-16:43:14] [INFO] delete file /usr/local/Ascend/ascend-toolkit/set_env.sh successfully.
[Toolkit] [20241022-16:43:14] [INFO] delete file /usr/local/Ascend/ascend-toolkit/latest/arm64-linux successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/arm64-linux is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/8.0.RC3.alpha003/aarch64-linux/runtime is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/install.conf is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/script is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/runtime is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] delete file /usr/local/Ascend/ascend-toolkit/8.0 successfully.
[Toolkit] [20241022-16:43:14] [INFO] The soft link /usr/local/Ascend/ascend-toolkit/8.0 is created successfully.
[Toolkit] [20241022-16:43:14] [INFO] The permission on the file and folder is granted successfully.
[Toolkit] [20241022-16:43:14] [INFO] process end
升级cann后重新编译安装后,还是报错之前一样的错误
λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
I1022 17:51:24.640590 205203 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 17:51:24.640648 205203 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:25.504863 205203 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 17:51:25.512694 205203 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 17:51:25.512885 205203 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:25.512928 205203 init.cc:242] CustomDevice: npu, visible devices count: 8
version: 0.0.0
commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
custom_op commit: 77090b642bfecc527ea0dc8ede72e9f0eec60a10
cann: 8.0.RC3
λ master-1 /work/PaddleCustomDevice/backends/npu {develop} python -c "import paddle; paddle.utils.run_check()"
I1022 17:51:33.433328 205664 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1022 17:51:33.433386 205664 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:34.248180 205664 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1022 17:51:34.255592 205664 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1022 17:51:34.255789 205664 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1022 17:51:34.255828 205664 init.cc:242] CustomDevice: npu, visible devices count: 8
Running verify PaddlePaddle program ...
Call aclrtSetDevice(device->id) failed : 507033 at file /work/PaddleCustomDevice/backends/npu/runtime/runtime.cc line 430
EL0002: [PID: 205664] 2024-10-22-17:51:34.972.389 The device ID is invalid.
TraceBack (most recent call last):
Failed to open device, retCode=0x7020013, deviceId=0.[FUNC:InitRawDriver][FILE:device.cc][LINE:281]
Failed to init RawDriver, device_id=0, retCode=0x7020013[FUNC:Init][FILE:device.cc][LINE:324]
Check param failed, dev can not be NULL![FUNC:DeviceRetain][FILE:runtime.cc][LINE:4040]
Check param failed, dev can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3736]
Check param failed, ctx can not be NULL![FUNC:PrimaryContextRetain][FILE:runtime.cc][LINE:3763]
Check param failed, context can not be null.[FUNC:NewDevice][FILE:api_impl.cc][LINE:2463]
New device failed, retCode=0x7010006[FUNC:SetDevice][FILE:api_impl.cc][LINE:2486]
rtSetDevice execute failed, reason=[device retain error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
open device 0 failed, runtime result = 507033.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:5326]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
日志:
λ master-1 /work/PaddleCustomDevice/backends/npu {develop} grep -r ERROR ~/ascend/log/debug/plog/
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] DRV(205664,python):2024-10-22-17:51:34.972.314 [ascend][curpid: 205664, 205664][drv][tsdrv][share_log_read_in_single_module 634]Unsupported command. (devid=0; cmd=12) Unsupported command. (devid=0; cmd_nr=12)
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] DRV(205664,python):2024-10-22-17:51:34.972.333 [ascend][curpid: 205664, 205664][drv][tsdrv][trsDevInit 168]Sqcq init failed. (devid=0; ret=17)
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.631 [npu_driver.cc:2459]205664 DeviceOpen:[drv api] drvDeviceOpen failed: device_id=0, drvRetCode=2!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.653 [device.cc:281]205664 InitRawDriver:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.657 [device.cc:281]205664 InitRawDriver:Failed to open device, retCode=0x7020013, deviceId=0.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.698 [device.cc:324]205664 Init:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.702 [device.cc:324]205664 Init:Failed to init RawDriver, device_id=0, retCode=0x7020013
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:34.972.718 [runtime.cc:4008]205664 DeviceRetain:Failed to init device.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.187 [runtime.cc:4040]205664 DeviceRetain:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.220 [runtime.cc:4040]205664 DeviceRetain:Check param failed, dev can not be NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.319 [runtime.cc:3736]205664 PrimaryContextRetain:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.323 [runtime.cc:3736]205664 PrimaryContextRetain:Check param failed, dev can not be NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.350 [runtime.cc:3763]205664 PrimaryContextRetain:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.353 [runtime.cc:3763]205664 PrimaryContextRetain:Check param failed, ctx can not be NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.389 [api_impl.cc:2463]205664 NewDevice:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.393 [api_impl.cc:2463]205664 NewDevice:Check param failed, context can not be null.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.406 [api_impl.cc:2486]205664 SetDevice:report error module_type=0, module_name=EE9999
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.409 [api_impl.cc:2486]205664 SetDevice:New device failed, retCode=0x7010006
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.436 [api_error.cc:1716]205664 SetDevice:Set device failed, device_id=0, deviceMode=0.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.492 [api_c_device.cc:58]205664 rtSetDevice:ErrCode=507033, desc=[device retain error], InnerCode=0x7010006
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.502 [error_message_manage.cc:53]205664 FuncErrorReason:report error module_type=3, module_name=EE8888
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.507 [error_message_manage.cc:53]205664 FuncErrorReason:rtSetDevice execute failed, reason=[device retain error]
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] ASCENDCL(205664,python):2024-10-22-17:51:37.933.544 [device.cpp:147]205664 aclrtSetDevice: open device 0 failed, runtime result = 507033.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.676 [api_impl.cc:5326]205664 GetDevErrMsg:report error module_type=3, module_name=EE8888
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.680 [api_impl.cc:5326]205664 GetDevErrMsg:ctx is NULL!
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.696 [api_impl.cc:5383]205664 GetDevMsg:Failed to GetDeviceErrMsg, retCode=0x7070001.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.699 [api_error.cc:3207]205664 GetDevMsg:GetDeviceMsg failed, getMsgType=0.
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.706 [api_c_device.cc:429]205664 rtGetDevMsg:ErrCode=107002, desc=[context pointer null], InnerCode=0x7070001
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.709 [error_message_manage.cc:48]205664 FuncErrorReason:report error module_name=EE1001
/root/ascend/log/debug/plog/plog-205664_20241022175135120.log:[ERROR] RUNTIME(205664,python):2024-10-22-17:51:37.933.713 [error_message_manage.cc:48]205664 FuncErrorReason:rtGetDevMsg execute failed, reason=[context pointer null]
问题描述 Issue Description
版本&环境信息 Version & Environment Information