AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.53k stars 293 forks source link

remove attention type from gemma2 model configs #877

Closed wenxindongwork closed 2 months ago

wenxindongwork commented 2 months ago

The default gemma2 configs have a preset attention type. This disallows users to override the attention type flag (e.g. use FlashAttention instead). Hence, we are removing the attention_type from the Gemma2 configs.

gobbleturk commented 2 months ago

@ZhaoyueCheng can you comment? I think sliding + flash attention works fine so indeed we should allow any attention type?

ZhaoyueCheng commented 2 months ago

@gobbleturk yeah this LGTM, I added the constraint when we don't have flash attention for sliding window, now it's supported I think it makes sense to remove the constraint.