Align Tokenizer in JetStream

AI-Hypercomputer / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Apache License 2.0

202 stars 26 forks source link

Closed JoeZijunZhou closed 5 months ago

JoeZijunZhou commented 5 months ago

Return token ids instead of custom decode operation result in JetStream; let client do tokenizer decode operation
Update benchmark script and requester tool to use JetStream tokenizer seqio library for tokenizer decode operation
- Use correct method to let tokenizer decode the whole output token id list
Update unit tests for the above change
Enforce python type check to benchmark script
Update README unit test section