flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.22k stars 115 forks source link

feat: add group gemm operators #282

Closed yzh119 closed 4 months ago

yzh119 commented 4 months ago

First step towards #199 .

Group gemm should also be helpful for MoE.

yzh119 commented 4 months ago

Test passed, I'll merge this PR first, for the next steps, we need to compile more shapes (for lora shrink and expand), and integrate punica's bgmv and sgmv implementations for extremes shapes (vector, etc).