alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
http://www.mnn.zone/
8.71k stars 1.67k forks source link

mnn在intel hd 630显卡上出现卡死 #2557

Closed feixuedudiao closed 9 months ago

feixuedudiao commented 1 year ago

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

j基于windows10系统在集成显卡intel hd 630的gpu上运行mnn

编译方式:

Compiling Method


请在这里粘贴cmake参数或使用的cmake脚本路径以及完整输出
Paste cmake arguments or path of the build script used here as well as the full log of the cmake proess here or pastebin
echo "building  the win x64"
CALL "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
powershell ./schema/generate.ps1

echo "building Static_Release64"
mkdir Static_CL_Release64 && cd Static_CL_Release64
cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DMNN_USE_SYSTEM_LIB=OFF -DMNN_OPNECL_SVM_ENABLE=OFF  -DMNN_SEP_BUILD=OFF -DMNN_BUILD_SHARED_LIBS=OFF -DMNN_BUILD_TOOLS=OFF -DMNN_OPENCL=ON -DMNN_SUPPORT_BF16=OFF -DMNN_AVX512=ON ..
ninja

#问题描述:
mnn参数配置,推理采用ScheConfig.type == MNN_FORWARD_OPENCL, OpenCL缓存加速参数 tScheConfig.mode = MNN_GPU_TUNING_WIDE | MNN_GPU_MEMORY_IMAGE,模型缓存cache文件可以生成;但当多次加载缓存文件时在解释器的createSession中出现卡住,具体报错堆栈信息如下:
  0  Id: 2440.154c Suspend: 0 Teb: 00000009`317cc000 Unfrozen
 # Child-SP          RetAddr               Call Site
00 00000009`315cd628 00007ffd`138c2708     ntdll!NtYieldExecution+0x14
01 00000009`315cd630 00007ffc`9c51f93f     KERNELBASE!SwitchToThread+0x28
02 00000009`315cd660 00007ffc`9c4e026c     igdrcl64!clReleaseGlSharedEventINTEL+0x4d54f
03 00000009`315cd6b0 00007ffc`9c37c331     igdrcl64!clReleaseGlSharedEventINTEL+0xde7c
04 00000009`315cd740 00007ffc`9c3dd639     igdrcl64!GTPin_Init+0x3af51
05 00000009`315cd7b0 00007ffc`9c4b3bb1     igdrcl64!GTPin_Init+0x9c259
06 00000009`315cd7f0 00007ffc`9c37a5f5     igdrcl64!GTPin_Init+0x1727d1
07 00000009`315cd9e0 00007ffc`9c33220d     igdrcl64!GTPin_Init+0x39215
08 00000009`315cdb00 00007ffc`d4fe339f     igdrcl64+0x2220d

09 00000009`315cdd10 00007ffc`9d21b1f2     OpenCL!clEnqueueMapBuffer+0xaf
0a 00000009`315cdd70 00007ffc`9d21bedb     img_bg!MNN::OpenCL::ConvExecution::ConvExecution+0x10a2
0b 00000009`315ce0e0 00007ffc`9d25fa5b     img_bg!MNN::OpenCL::ConvolutionCreator::onCreate+0x32b
0c 00000009`315ce160 00007ffc`9d3a98c0     img_bg!MNN::OpenCL::OpenCLBackend::onCreate+0x68b
0d 00000009`315ce270 00007ffc`9d3ab1fe     img_bg!MNN::Pipeline::_copyInputs+0x230
0e 00000009`315ce340 00007ffc`9d39da20     img_bg!MNN::Pipeline::allocMemory+0x76e
0f 00000009`315ce440 00007ffc`9d3b11ff     img_bg!MNN::Session::resize+0xb0
10 00000009`315ce470 00007ffc`9d3b184b     img_bg!MNN::Interpreter::createMultiPathSession+0x2ef
11 00000009`315ce5e0 00007ffc`9d1d0abb     img_bg!MNN::Interpreter::createSession+0x1bb
根据保存信息结合mnn源代码跟踪发现问题主要出现在加载cache缓存文件后,从cpu往gpu中write时出现异常,麻烦mnn的开发大佬帮忙看看,谢谢。
Qxinyu commented 1 year ago

Intel的平台推荐使用buffer模式,性能更快,可以试下将tScheConfig.mode = MNN_GPU_TUNING_WIDE | MNN_GPU_MEMORY_IMAGE改为tScheConfig.mode = MNN_GPU_TUNING_WIDE | MNN_GPU_MEMORY_BUFFER。

github-actions[bot] commented 9 months ago

Marking as stale. No activity in 60 days.