Open MoFHeka opened 1 year ago
@nouiz Would you have comments on this about JAX? @jeng1220 @zlsh80826 for viz
There is effort to have native XLA support for fp8. But it is more complex and so it will take more time. TE is the fast path for fp8 kernels.
How about pytorch FP8 data type? I noticed that it was already merged into main branch.
Also I found TE still not supporting attention dot calculation when LLM casual text training. Would it be better when using XLA? @nouiz
Finally I want to know if there is a roadmap for TE FP8 LLM training. Should I choose Jax or torch? Not for today, but for the next year.
Torch FP8 data type may be released at version 2.1, and Jax FP8 supported has already being released.