HabanaAI / vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
43 stars 58 forks source link

Random sampler warmup #506

Closed mfylcek closed 5 days ago

mfylcek commented 1 week ago

Execute model with random sampler once per batch size.

mfylcek commented 1 week ago

The random sampler warmup is done in HPU graph capturing phase for minimal impact on warmup time. Samplers are executed in lazy mode without HPU graphs. Greedy sampler is warmed-up in warmup_all_buckets.