alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
http://www.mnn.zone/
8.46k stars 1.64k forks source link

英伟达T4上运行OpencCL报错 #2939

Open tzhang2014 opened 1 week ago

tzhang2014 commented 1 week ago

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

Centos 7.6

Github版本:

Github Version:

MNN-2.9.1 直接下载ZIP包请提供下载日期以及压缩包注释里的git版本(可通过7z l zip包路径命令并在输出信息中搜索Comment 获得,形如Comment = bc80b11110cd440aacdabbf59658d630527a7f2b)。 git clone请提供 git commit 第一行的commit id

Provide date (or better yet, git revision from the comment section of the zip. Obtainable using 7z l PATH/TO/ZIP and search for Comment in the output) if downloading source as zip,otherwise provide the first commit id from the output of git commit

编译方式:

Compiling Method

gcc 9.3.0

请在这里粘贴cmake参数或使用的cmake脚本路径以及完整输出
Paste cmake arguments or path of the build script used here as well as the full log of the cmake proess here or pastebin

if [ -d "./build" ]; then rm -rf ./build fi

mkdir -p build cd build cmake .. \ -DMNN_BUILD_TEST=ON \ -DMNN_OPENCL=ON \ -DMNN_BUILD_QUANTOOLS=ON \ -DMNN_BUILD_DEMO=ON \ -DMNN_BUILD_CONVERTER=ON \ -DMNN_BUILD_BENCHMARK=ON \ -DMNN_BUILD_LLM=ON \ -DMNN_SEP_BUILD=OFF \ -DMNN_LOW_MEMORY=ON \ -DMNN_SUPPORT_TRANSFORMER_FUSE=ON make -j${nproc}

编译日志:

Build Log:

粘贴在这里
Paste log here or pastebin

运行报错:

model path is ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn model_type is qwen1.5_0.5b

model name : Qwen2_0.5b

config.numThread = 68 The device support i8sdot:0, support fp16:0, support i8mm: 0

precision, memory = 0, 0

load tokenizer load tokenizer Done

disk embedding is 1

load ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn ... Done! main, 197, cost time: 5993.038086 ms Prepare for resize opt Begin Program build log: ptxas error : Entry function 'tile_trans_4d_buf' uses too much shared data (0x10010 bytes, 0xc000 max)

Build program failed, err:-11 ! programName.c_str()=s loop_buf in buildKernelWithCache, 683 CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -48, info:broadcast_binary_buf 3D lws null res broadcast_binary_buf CL ERROR CODE : -58, info:clEvent CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -48, info:broadcast_binary_buf 3D lws null res broadcast_binary_buf CL ERROR CODE : -58, info:clEvent CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel

Qxinyu commented 5 days ago

这个可以等下个版本更新解决。

tzhang2014 commented 4 days ago

@Qxinyu 哦哦,我之前改成image模式可以跑,num_thread=132,但是效率很慢, model path is ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn model_type is qwen1.5_0.5b

model name : Qwen2_0.5b

config.numThread = 132 The device support i8sdot:0, support fp16:0, support i8mm: 0

precision, memory = 2, 2

load tokenizer load tokenizer Done

disk embedding is 1

load ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn ... Done! main, 197, cost time: 28078.185547 ms Prepare for resize opt Begin Prepare for resize opt End Fix: 1095 - Total: 1171, rate = 0.935098 main, 201, cost time: 4380.371094 ms prompt file is promot.txt 四大名著是《西游记》、《水浒传》、《三国演义》和《红楼梦》。

################################# prompt tokens num = 12 decode tokens num = 27 prefill time = 0.70 s decode time = 11.59 s prefill speed = 17.23 tok/s decode speed = 2.33 tok/s ##################################