alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
8.46k stars 1.64k forks source link

英伟达T4上运行OpencCL报错 #2939

Open tzhang2014 opened 1 week ago

tzhang2014 commented 1 week ago


Platform(Include target platform as well if cross-compiling):

Centos 7.6


Github Version:

MNN-2.9.1 直接下载ZIP包请提供下载日期以及压缩包注释里的git版本(可通过7z l zip包路径命令并在输出信息中搜索Comment 获得,形如Comment = bc80b11110cd440aacdabbf59658d630527a7f2b)。 git clone请提供 git commit 第一行的commit id

Provide date (or better yet, git revision from the comment section of the zip. Obtainable using 7z l PATH/TO/ZIP and search for Comment in the output) if downloading source as zip,otherwise provide the first commit id from the output of git commit


Compiling Method

gcc 9.3.0

Paste cmake arguments or path of the build script used here as well as the full log of the cmake proess here or pastebin

if [ -d "./build" ]; then rm -rf ./build fi



Build Log:

Paste log here or pastebin


model path is ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn model_type is qwen1.5_0.5b

model name : Qwen2_0.5b

config.numThread = 68 The device support i8sdot:0, support fp16:0, support i8mm: 0

precision, memory = 0, 0

load tokenizer load tokenizer Done

disk embedding is 1

load ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn ... Done! main, 197, cost time: 5993.038086 ms Prepare for resize opt Begin Program build log: ptxas error : Entry function 'tile_trans_4d_buf' uses too much shared data (0x10010 bytes, 0xc000 max)

Build program failed, err:-11 ! programName.c_str()=s loop_buf in buildKernelWithCache, 683 CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -48, info:broadcast_binary_buf 3D lws null res broadcast_binary_buf CL ERROR CODE : -58, info:clEvent CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -48, info:broadcast_binary_buf 3D lws null res broadcast_binary_buf CL ERROR CODE : -58, info:clEvent CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel CL ERROR CODE : -48, info:setArg LoopBinaryBufExecution CL ERROR CODE : -45, info:getKernel

Qxinyu commented 5 days ago


tzhang2014 commented 4 days ago

@Qxinyu 哦哦,我之前改成image模式可以跑,num_thread=132,但是效率很慢, model path is ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn model_type is qwen1.5_0.5b

model name : Qwen2_0.5b

config.numThread = 132 The device support i8sdot:0, support fp16:0, support i8mm: 0

precision, memory = 2, 2

load tokenizer load tokenizer Done

disk embedding is 1

load ../MNN-2.9.1/mnn-models/Qwen1_5-0_5B-Chat/Qwen1_5-0_5B-Chat.mnn ... Done! main, 197, cost time: 28078.185547 ms Prepare for resize opt Begin Prepare for resize opt End Fix: 1095 - Total: 1171, rate = 0.935098 main, 201, cost time: 4380.371094 ms prompt file is promot.txt 四大名著是《西游记》、《水浒传》、《三国演义》和《红楼梦》。

################################# prompt tokens num = 12 decode tokens num = 27 prefill time = 0.70 s decode time = 11.59 s prefill speed = 17.23 tok/s decode speed = 2.33 tok/s ##################################