question regarding the Parent-Scaled Self-Attention

e-bug / pascal

[ACL 2020] Code and data for our paper "Enhancing Machine Translation with Dependency-Aware Self-Attention"

https://www.aclweb.org/anthology/2020.acl-main.147/

MIT License

22 stars 10 forks source link

question regarding the Parent-Scaled Self-Attention #5

Closed lovodkin93 closed 3 years ago

lovodkin93 commented 3 years ago

Hello, In your paper, you have stated that p_t is the middle position of the t-th token's dependency parent. My question is, what if the dependency parent is segmented into an even number of subwords? For example, if the word "pretrain" is divided into "pre" and "train", should the middle be "pre" or "train"? Thanks!

e-bug commented 3 years ago

Hi, you just set the mean to the corresponding ".5" position. For example, if pos(pre)=2 and pos(train)=3, then middle=2.5. You can see what this means in terms of distribution in Figure 3 in the paper (e.g., if there are only two subwords, both get the same weight).

lovodkin93 commented 3 years ago

great, thanks!