Closed lorsanta closed 2 years ago
The computation of seg_mask relies on the fact that the sum of a segment composed of only pad token ids is equal to zero. https://github.com/coastalcph/trldc/blob/b843576875654bc887e904777ecc4a0dc3091ba5/dainlp/models/cls/hierarchical.py#L59
But with RoBERTa this is not true since by default pad_token_id = 1. (source)
pad_token_id = 1
It would be better to compute seg_mask using attention_mask instead of input_ids.
seg_mask
attention_mask
input_ids
thank you for spotting this, i have fixed it
The computation of seg_mask relies on the fact that the sum of a segment composed of only pad token ids is equal to zero. https://github.com/coastalcph/trldc/blob/b843576875654bc887e904777ecc4a0dc3091ba5/dainlp/models/cls/hierarchical.py#L59
But with RoBERTa this is not true since by default
pad_token_id = 1
. (source)It would be better to compute
seg_mask
usingattention_mask
instead ofinput_ids
.