flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
768 stars 64 forks source link

feat: support cuda graph for batched multi-query(prefill/append) attention #277

Closed yzh119 closed 1 month ago

yzh119 commented 1 month ago

275 is not complete, this pr pushes the remaining changes.