Open wenshuai-xiaomi opened 1 month ago
By the way, on trtllm 0.9 version, with the patch of https://github.com/Franc-Z/QWen1.5_TensorRT-LLM and some small change, the engine file can be generated and work well on g4t4 with16G memory.
Feel free to reopen it if you have further questions.
why close the issue?
I just meat it work on 0.9 version, but failed on the newest code.
[07/17/2024-01:56:09] [TRT] [E] Error Code: 4: Internal error: plugin node QWenForCausalLM/transformer/layers/0/attention/wrapper/gpt_attention/PLUGIN_V2_GPTAttention_0 requires 26927499520 bytes of scratch space, but only 15642329088 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[07/17/2024-01:56:09] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (Internal error: plugin node QWenForCausalLM/transformer/layers/0/attention/wrapper/gpt_attention/PLUGIN_V2_GPTAttention_0 requires 26927499520 bytes of scratch space, but only 15642329088 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). ) [07/17/2024-01:56:09] [TRT-LLM] [E] Engine building failed, please check the error log.
The issue is generated on g4t4 with 16G memory. It works on V100 with 32G memory. How to fix it on this GPU with 16G memory?