about the Edit-factorized BERT Architecture

awasthiabhijeet / PIE

Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)

MIT License

228 stars 40 forks source link

about the Edit-factorized BERT Architecture #33

Open wangwang110 opened 2 years ago

wangwang110 commented 2 years ago

for replace , when we calculate attention score of position i , we don't consider the token w(i).

at the first layer , I think it is no problem, but we use the info of w(i) indirectly at the seconder or upper layers.

Is it ok ?