GoogleCloudPlatform / ai-on-gke

AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Apache License 2.0
194 stars 143 forks source link

Metrics support for Average Time To First Token #650

Closed kfswain closed 2 months ago

kfswain commented 2 months ago

This is a quick CL to support and capture TTFT data emitted during a benchmarking test. Currently only JetStream using gRPC supports TTFT. Additionally, this metric will be expanded to be captured as a histogram, instead of aggregating the data as an average