JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
202
stars
26
forks
source link
Add jax_padding support driver and server lib #54
Closed
FanhaiLu1 closed 5 months ago
This pr add jax_padding support driver and server lib, engine implementation can decide to use jax or np padding.
We suggest all the engine implementation to use np padding, will remove all the jax_padding after all the engine migrate to np padding.