JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
202
stars
26
forks
source link
Standalone JetStream removes pinned deps #129
Closed
JoeZijunZhou closed 1 month ago