AI-Hypercomputer / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Apache License 2.0
202 stars 26 forks source link

Support I/O with text and token ids #79

Closed JoeZijunZhou closed 4 months ago

JoeZijunZhou commented 5 months ago
kiratp commented 4 months ago

Just chiming in here that the customer quoted is us :). The main challenger is that we have clients in multiple languages that don’t always have tokenizer implementations readily available. Every other prominent model server does detokenization, hence the request.

Doesn’t hurt that there are so many CPU cores on the TPU VMs that are mostly idle during inference anyway.

Thanks @JoeZijunZhou !

JoeZijunZhou commented 4 months ago

Resolved this issue in #78 . It's available in main.