Closed douseful closed 1 year ago
Hi douseful,
The BOS token is added manually as the GPT2 tokeniser does not add it.
Notice that [:-1] corresponds to the last element being thrown away, not the first.
If we through away the last element of the attention mask, we would be discarding padding. As we are shortening the input and output sequence by one token due to teacher forcing, we want to discard an element of the attention mask that corresponds to the tokens and not the padding, hence, the first element is perfect. In the end, this does not matter as we are using causal attention masking.
Hope this helps, Aaron.
https:github.com/aehrc/cvt2distilgpt2/blob/48aa7fd40fd23614ecb2bf63c4c639d3b418cb0b/tools/dataset/dataset.py#L89C2-L114C28
Please could you tell me why we have to manually add the start and end marks to the report, and then when selecting the attention mask, the first element is discarded (corresponding to BOS) but the last element is not discarded? And why do the we need to throw away the first element of the decoder_input_ids?