AI-Hypercomputer / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Apache License 2.0
202 stars 26 forks source link

Change `jetstream_slots_available_percentage` to `jetstream_slots_used_percentage` #102

Closed Bslabe123 closed 3 months ago

Bslabe123 commented 3 months ago

HPA doesn't allow scaling up as a metric decreases. Our jetstream_slots_available_percentage metric has to be changed to jetstream_slots_used_percentage. This is also in line with other inference server metrics like tgi_batch_current_size and nv_trt_llm_inflight_batcher_metrics