fajri91 / discourse_probing

Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.
10 stars 1 forks source link

Missing details #1

Open vgaraujov opened 2 years ago

vgaraujov commented 2 years ago

Hi! Thanks for the contribution. I want to use your probings tasks (EN and ES); however, I came across some problems:

  1. Regarding EN data for tasks 4, 5, and 6. You don't include code for extracting EN data. Are you supposed to provide the final EN data? The data in the data_en folder seems to be incomplete. Could you please take a look at it?
  2. Regarding EN data for tasks 4, 5, and 6. You provide code for extracting EN data. However, there is a missing module in the script: from discourse_tree_utils import *. Is it a missing file?

I would really appreciate you could support me on this.

hotzjacobb commented 1 year ago

@vgaraujov Hi Vladimir, the dataset that the authors cite in the paper is the Penn Treebank. This is a proprietary dataset so they unfortunately are now allowed to share it. Hopefully you have access to it through an institution? https://catalog.ldc.upenn.edu/LDC2002T07 Feel free to message me.

As for how it's parsed, I'm not sure, but I might need to explore this so maybe I can update this.

Cheers. (: