e-bug / pascal

[ACL 2020] Code and data for our paper "Enhancing Machine Translation with Dependency-Aware Self-Attention"
https://www.aclweb.org/anthology/2020.acl-main.147/
MIT License
22 stars 10 forks source link

Try to reproduce results #8

Open lovodkin93 opened 3 years ago

lovodkin93 commented 3 years ago

Hello, I am trying to reproduce your results of the Pascal layer. But it appears the results I am getting are worse than the results acquired by the vanilla version. I am trying to understand if I missed something. These are the steps I did:

  1. I parsed the WMT and the newstest sentences using udpipe.
  2. I created the parent scaled masks from the ud parsings (according to the equation described in your paper).
  3. I added the masks to the input.
  4. for each sentence I multiplied the attention logits with the equivalent mask.

I was wondering if you did the same and if you encountered any decrease in performance at first, and if so, what did you do to bypass this obstacle. Thanks!