Implements support for bidirectional attention, as mentioned in #74.
Introduces a new parameter mask_upper_tri to the attention_pattern and attention_heads functions that toggles whether or not to mask the upper triangular region. It defaults to true, in order to match existing functionality.
Implements support for bidirectional attention, as mentioned in #74.
Introduces a new parameter
mask_upper_tri
to theattention_pattern
andattention_heads
functions that toggles whether or not to mask the upper triangular region. It defaults totrue
, in order to match existing functionality.