flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.1k stars 98 forks source link

Enable GQA group size = 6 #201

Closed vinx13 closed 4 months ago

vinx13 commented 4 months ago

This enables compilation for group size = 6 as needed for mixtral 8x22B, haven't looked into performance though.

yz-tang commented 4 months ago

@yzh119 I followed this method and changed group_size to 7, but the result was wrong. Is this odd number supported?