Added attn and mlp bias

Motivation

[Describe why this change is needed]

The Calico models currently set the mlp and attention bias to true, which was hard-coded to false in flash and paged llama implementations. This will use the config params set in https://github.com/huggingface/transformers/pull/30031 to set those values properly.

Modifications

[Describe the code changes]

added attention_bias, mlp_bias to config for Flash and Paged Llama implementations (default is False)
set bias in attention and mlp to the config value

Result

[Describe how the changes affects existing behavior and how to test it]

Models should be able to load properly if containing attention and mlp bias

IBM / text-generation-inference

Added attn and mlp bias #84

Motivation

Modifications

Result

Related Issues