Prefill return first token

AI-Hypercomputer / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Apache License 2.0

202 stars 26 forks source link

Closed jwyang-google closed 3 months ago

jwyang-google commented 3 months ago

Modify Jetstream to make prefill return first token. Pending testing with MLPerf loadgen.