flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.21k stars 111 forks source link

feat: customize `logits_soft_cap` value #339

Closed yzh119 closed 3 months ago

yzh119 commented 3 months ago

This PR supports customized logits soft cap values. Different models might use different logits soft cap values (e.g. Grok-1 uses 30 and Gemma-2 uses 50).

zhyncs commented 3 months ago

Gemma-2 uses 50

Great work!