Open kevinmgyu opened 2 years ago
想问下是我上边的配置文件写的有错误吗?就是transformer翻译场景的输入和输出
显卡足够大
模型就是不到1G
改成11.6的cuda编译,升级了cuda驱动到11.6,启动正常了,但是这个1g的模型居然占据的显卡这么大
用triton client发起请求报下边错误
max_step and max_batch_size will influence GPU memory usage.
On Thu, Mar 31, 2022 at 6:07 PM kevinmgyu @.***> wrote:
改成11.6的cuda编译,升级了cuda驱动到11.6,启动正常了,但是这个1g的模型居然占据的显卡这么大 [image: image] https://user-images.githubusercontent.com/42665912/161030947-280c1418-ee4d-4815-a37e-cc73325ceb37.png
用triton client发起请求报下边错误 [image: image] https://user-images.githubusercontent.com/42665912/161031146-c37bbde7-1721-4387-80ca-8856a7c9fe24.png
— Reply to this email directly, view it on GitHub https://github.com/bytedance/lightseq/issues/287#issuecomment-1084364722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELIZAPDO2TVNSV4LJFKU23VCV2OZANCNFSM5SETEJGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
max_step and max_batch_size will influence GPU memory usage. … On Thu, Mar 31, 2022 at 6:07 PM kevinmgyu @.> wrote: 改成11.6的cuda编译,升级了cuda驱动到11.6,启动正常了,但是这个1g的模型居然占据的显卡这么大 [image: image] https://user-images.githubusercontent.com/42665912/161030947-280c1418-ee4d-4815-a37e-cc73325ceb37.png 用triton client发起请求报下边错误 [image: image] https://user-images.githubusercontent.com/42665912/161031146-c37bbde7-1721-4387-80ca-8856a7c9fe24.png — Reply to this baemail directly, view it on GitHub <#287 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELIZAPDO2TVNSV4LJFKU23VCV2OZANCNFSM5SETEJGA . You are receiving this because you are subscribed to this thread.Message ID: @.>
请问下这个是什么问题引起的?? 输入的数据就是这个简单测试下 配置是这样的
You can build with debug mode to check.
You can build with debug mode to check.
上边编译lightseq就是用的debug模式编译的,但是没有更多提示信息了
check here to use lightseq debug mode. https://github.com/bytedance/lightseq/blob/9a617306fa/docs/inference/build.md
**model config*** encoder layers: 6 decoder layers: 6 hidden size: 1024 inner size: 4096 head number: 16 dim per head: 64 src vocab size: 40480 trg vocab size: 42720 is_post_ln: 0 no_scale_embedding: 0 use_gelu: 0 start_id: 2 end_id: 6 padding_id: 2 multilg_type: 0
generator config beam size: 4 max step: 1024 extra decode length(max decode length - src input length): 50 length penalty: 0.6 diverse lambda: 0 sampling method: beam_search topk: 1 topp: 0.75 unable to allocate memory in function AllocateCudaBuffersout of memoryE0331 16:32:37.715184 45 dynamic_batch_scheduler.cc:162] Initialization failed for dynamic-batch scheduler thread 3: initialize error for 'transformer_server': (12) cudaMalloc failed I0331 16:32:43.967144 45 server.cc:400] Polling model repository I0331 16:32:58.967572 45 server.cc:400] Polling model repository
cuda 10.1 版本编译的libtransformer_server.so cuda 10.1 gpu上运行的
config.pbtxt的文件如下: name: "transformer_server" platform: "custom" max_batch_size: 1024 default_model_filename: "libtransformer_server.so" input [ { name: "src_ids:0" data_type: TYPE_INT32 dims: [ -1 ] } ] output [ { name: "trg_ids:0" data_type: TYPE_INT32 dims: [-1,-1,-1] } ] instance_group [ { count: 1 } ]
请问这个如何解决