flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
822 stars 77 forks source link

sampling: support parallel top-p sampling #213

Closed yzh119 closed 2 months ago

yzh119 commented 2 months ago

Add a new API ParallelTopPSamplingFromProb, which enables sampling from the same distribution multiple times, and allowing user to specify batch-specific top_p.

cc @MasterJH5574