Hello, Qwen2 implements attention calculation using GQA. In our implementation, we have added support for GQA, and using our LLaMA implementation, it can support GQA models like LLaMA-3. The model architecture of Qwen2 shares similarities with LLaMA, so you can extend Qwen2 based on our LLaMA implementation.
Hello, Qwen2 implements attention calculation using GQA. In our implementation, we have added support for GQA, and using our LLaMA implementation, it can support GQA models like LLaMA-3. The model architecture of Qwen2 shares similarities with LLaMA, so you can extend Qwen2 based on our LLaMA implementation.