flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
768 stars 64 forks source link

bugfix: fix wrong `padded_batch_size_` #296

Closed yzh119 closed 3 weeks ago

yzh119 commented 3 weeks ago

In #294 , we set padded_batch_size_ to num_kv_heads * batch_size when no splitting kv, which should be batch_size