AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Apache License 2.0
194
stars
143
forks
source link
Metrics support for Average Time To First Token #650
This is a quick CL to support and capture TTFT data emitted during a benchmarking test. Currently only JetStream using gRPC supports TTFT. Additionally, this metric will be expanded to be captured as a histogram, instead of aggregating the data as an average
This is a quick CL to support and capture TTFT data emitted during a benchmarking test. Currently only JetStream using gRPC supports TTFT. Additionally, this metric will be expanded to be captured as a histogram, instead of aggregating the data as an average