deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
199 stars 67 forks source link

[fix][ci] specify guaranteed_no_evict batch_scheduler_policy to get t… #2558

Closed siddvenk closed 1 week ago

siddvenk commented 1 week ago

…5 working with in flight batching

Description

The default batch_scheduler_policy of max_utilization does not work with enc_dec models that use in flight batching + streaming. However, the guaranteed_no_evict policy does work. We update the policy here for CI, but need to document this as well in release notes.

We still don't know exactly why this difference is only for enc_dec models. We should probably look into that more, as well as explore how this impacts performance for other model types. We're using max_utilization as the default since that maximizes throughput, but it seems like guaranteed no_evict is better for latency