Attention mask for computation of replace and append operation

awasthiabhijeet / PIE

Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)

MIT License

228 stars 40 forks source link

Attention mask for computation of replace and append operation #22

Open rivaldinho123 opened 4 years ago

rivaldinho123 commented 4 years ago

Hi, you mentioned in the papar that we calculate r{i}^{l} over h{j}^{l} for all j except i, but calculate a{i}^{l} over h{j}^{l} for all j including i. Why there is such a difference that we can't have information about the current token x_{i} when dealing with the replace operation but have access to the current token for append operation on the contrary?