flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
760 stars 64 forks source link

hotfix: fix the decode kernel with logits cap #350

Closed yzh119 closed 2 days ago

yzh119 commented 2 days ago

logits soft cap should be applied before masking.

Thanks @LiuXiaoxuanPKU for spotting this bug.