Adds eager_attention to AxolotlInputConfig so it can be set in the config files
Motivation and Context
When fine-tuning Gemma 2 there is a warning on the attention mechanism used, with eager attention strongly recommended. This allows it to be explicitly set, allowing users to pick between using the flash attention or eager attention.
Description
Adds eager_attention to AxolotlInputConfig so it can be set in the config files
Motivation and Context
When fine-tuning Gemma 2 there is a warning on the attention mechanism used, with eager attention strongly recommended. This allows it to be explicitly set, allowing users to pick between using the flash attention or eager attention.