Franc-Z / QWen1.5_TensorRT-LLM

Optimize QWen1.5 models with TensorRT-LLM
Apache License 2.0
15 stars 3 forks source link

can support 32B? #3

Open Linzecong opened 4 months ago

Linzecong commented 4 months ago

AssertionError: QWen uses MHA.

32B use MHA, is it not supported?