Add jax_padding support driver and server lib

AI-Hypercomputer / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Apache License 2.0

202 stars 26 forks source link

Closed FanhaiLu1 closed 5 months ago

FanhaiLu1 commented 5 months ago

This pr add jax_padding support driver and server lib, engine implementation can decide to use jax or np padding.

We suggest all the engine implementation to use np padding, will remove all the jax_padding after all the engine migrate to np padding.