fixie-ai / thefastest.ai

Website with current metrics on the fastest AI models.
MIT License
30 stars 3 forks source link

Add input vs output token latency #17

Open juberti opened 4 months ago

juberti commented 4 months ago

image

already have the data to compute this. input TPS should be (96 * output TPS), I think,

juberti commented 1 month ago

Now collecting these measurements for Groq. May be able to infer them for other providers based on hardware batch size limits (eg the suggestion above). H100 should yield 23 ktoken/sec on llama-3-8b/h100 at bs 1, 10 ktoken/sec on a100.