Open janelu9 opened 2 weeks ago
I know the hidden_states are the output of previous stage, but I don't understand the how the attention_mask is passed to the next transformer block.
I know the hidden_states are the output of previous stage, but I don't understand the how the attention_mask is passed to the next transformer block.