JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
202
stars
26
forks
source link
Prerequisite work for supporting disaggregation: #68
Closed
zhihaoshan-google closed 5 months ago