AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.47k stars 275 forks source link

Enable expert parallelism for dropping strategy #869

Closed RissyRan closed 2 weeks ago

RissyRan commented 2 weeks ago

Description

Add expert parallelism for dropping strategy

Test

End-to-end run: link