flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.22k stars 115 forks source link

perm: optimize sampling performance #212

Closed yzh119 closed 5 months ago

yzh119 commented 5 months ago

skip the scan operation on a chunk if we know this chunk do not overlap with the uniform needle.