flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.22k stars 115 forks source link

bugfix: Fix the behavior of decode cuda graph wrapper #291

Closed yzh119 closed 3 months ago

yzh119 commented 3 months ago

In this PR, we fixed the grid size of the kernels launched by a decode cuda graph wrapper: