dvlab-research / Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
https://arxiv.org/abs/2406.07528
38 stars 1 forks source link