Closed wenxindongwork closed 2 months ago
@ZhaoyueCheng can you comment? I think sliding + flash attention works fine so indeed we should allow any attention type?
@gobbleturk yeah this LGTM, I added the constraint when we don't have flash attention for sliding window, now it's supported I think it makes sense to remove the constraint.
The default gemma2 configs have a preset attention type. This disallows users to override the attention type flag (e.g. use FlashAttention instead). Hence, we are removing the attention_type from the Gemma2 configs.