HPA doesn't allow scaling up as a metric decreases. Our jetstream_slots_available_percentage metric has to be changed to jetstream_slots_used_percentage. This is also in line with other inference server metrics like tgi_batch_current_size and nv_trt_llm_inflight_batcher_metrics
HPA doesn't allow scaling up as a metric decreases. Our jetstream_slots_available_percentage metric has to be changed to jetstream_slots_used_percentage. This is also in line with other inference server metrics like
tgi_batch_current_size
andnv_trt_llm_inflight_batcher_metrics