some questions about the gpt tokenizer.

aehrc / cvt2distilgpt2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

GNU General Public License v3.0

64 stars 6 forks source link

Hi douseful,

The BOS token is added manually as the GPT2 tokeniser does not add it.

Notice that [:-1] corresponds to the last element being thrown away, not the first.

If we through away the last element of the attention mask, we would be discarding padding. As we are shortening the input and output sequence by one token due to teacher forcing, we want to discard an element of the attention mask that corresponds to the tokens and not the padding, hence, the first element is perfect. In the end, this does not matter as we are using causal attention masking.

Hope this helps, Aaron.

aehrc / cvt2distilgpt2

some questions about the gpt tokenizer. #12