when i use LongformerForQuestionAnswering source code, i found automatically set global attention code
sep_token_indices.shape[0] == 3 * batch_size sep_token_indices.view(batch_size, 3, 2)[:, 0, 1]
three is mean there have three sep token ,but i use dataset like triviqa ,input is question and content . only two sep token, so i change it
sep_token_indices.shape[0] == 2 * batch_size sep_token_indices.view(batch_size, 2, 2)[:, 0, 1]
Is it right for me to do this?
when i use LongformerForQuestionAnswering source code, i found automatically set global attention code
sep_token_indices.shape[0] == 3 * batch_size sep_token_indices.view(batch_size, 3, 2)[:, 0, 1]
three is mean there have three sep token ,but i use dataset like triviqa ,input is question and content . only two sep token, so i change itsep_token_indices.shape[0] == 2 * batch_size sep_token_indices.view(batch_size, 2, 2)[:, 0, 1]
Is it right for me to do this?