bugfix: fix wrong `padded_batch_size_`

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

768 stars 64 forks source link

Closed yzh119 closed 3 weeks ago

yzh119 commented 3 weeks ago

In #294 , we set padded_batch_size_ to num_kv_heads * batch_size when no splitting kv, which should be batch_size