problem about attention_mask

XiangLi1999 / Diffusion-LM

Diffusion-LM

Apache License 2.0

1.03k stars 134 forks source link

problem about attention_mask #19

Closed yeyn19 closed 2 years ago

yeyn19 commented 2 years ago

hey! I'm reading the code and I find that the atomic model architecture(self.input_transformers) is a huggingface BERT-encoder. It doesn't seem to add a attention_mask into the encoder, though padding the input. So is it necessary to add the attention_mask, and mask the same position in MSE-loss ?

XiangLi1999 commented 2 years ago

actually, I think it's important NOT to mask it out. Since you want the model to learn to end the sentence by padding, so that it should know that after the END tokens, all the suffixes should be PAD.

yeyn19 commented 2 years ago

thanks!