AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.53k stars 293 forks source link

Add configs for Attention Block Size tuning #898

Closed Obliviour closed 1 month ago

Obliviour commented 1 month ago

Add attention block size tuning options to MaxText config. We see 1->2% MFU improvement across models on Trillium that tune these parameters. They can be sized up to sequence length.