Closed warner-benjamin closed 3 months ago
A handful of Attention layers did not surface the attn_qkv_bias and attn_out_bias config options to their Linear layers.
attn_qkv_bias
attn_out_bias
Linear
LGTM Thankfully this is only for padded versions (since the default of nn.Linear is True).
A handful of Attention layers did not surface the
attn_qkv_bias
andattn_out_bias
config options to theirLinear
layers.