Closed Obliviour closed 1 month ago
Add attention block size tuning options to MaxText config. We see 1->2% MFU improvement across models on Trillium that tune these parameters. They can be sized up to sequence length.
Add attention block size tuning options to MaxText config. We see 1->2% MFU improvement across models on Trillium that tune these parameters. They can be sized up to sequence length.