flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.22k stars 115 forks source link

feat: support cuda graph for batched multi-query(prefill/append) attention #275

Closed yzh119 closed 4 months ago

yzh119 commented 4 months ago

Followup of #187 and #256