Open juberti opened 4 months ago
Now collecting these measurements for Groq. May be able to infer them for other providers based on hardware batch size limits (eg the suggestion above). H100 should yield 23 ktoken/sec on llama-3-8b/h100 at bs 1, 10 ktoken/sec on a100.
already have the data to compute this. input TPS should be (96 * output TPS), I think,