trtllm-build qwen2 0.5B failed

wenshuai-xiaomi commented 1 month ago

[07/17/2024-01:56:09] [TRT] [E] Error Code: 4: Internal error: plugin node QWenForCausalLM/transformer/layers/0/attention/wrapper/gpt_attention/PLUGIN_V2_GPTAttention_0 requires 26927499520 bytes of scratch space, but only 15642329088 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().

[07/17/2024-01:56:09] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (Internal error: plugin node QWenForCausalLM/transformer/layers/0/attention/wrapper/gpt_attention/PLUGIN_V2_GPTAttention_0 requires 26927499520 bytes of scratch space, but only 15642329088 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). ) [07/17/2024-01:56:09] [TRT-LLM] [E] Engine building failed, please check the error log.

The issue is generated on g4t4 with 16G memory. It works on V100 with 32G memory. How to fix it on this GPU with 16G memory?

wenshuai-xiaomi commented 1 month ago

By the way, on trtllm 0.9 version, with the patch of https://github.com/Franc-Z/QWen1.5_TensorRT-LLM and some small change, the engine file can be generated and work well on g4t4 with16G memory.

QiJune commented 1 month ago

Feel free to reopen it if you have further questions.

wenshuai-xiaomi commented 1 month ago

why close the issue?

I just meat it work on 0.9 version, but failed on the newest code.

NVIDIA / TensorRT-LLM

trtllm-build qwen2 0.5B failed #1967