bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations
Apache License 2.0
218 stars 23 forks source link

Is there a plan to support model Qwen2? #9

Open gloritygithub11 opened 1 month ago

zengchao0424 commented 3 weeks ago

Hello, Qwen2 implements attention calculation using GQA. In our implementation, we have added support for GQA, and using our LLaMA implementation, it can support GQA models like LLaMA-3. The model architecture of Qwen2 shares similarities with LLaMA, so you can extend Qwen2 based on our LLaMA implementation.