coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.
Apache License 2.0
30 stars 10 forks source link

Double-check UCCA datasets #63

Closed alexanderkoller closed 4 years ago

alexanderkoller commented 4 years ago

The UCCA MRP files that are currently in /proj/irtg/sempardata/mrp/LDC2019E45/2019 (modification date July 18) are identical with the UCCA MRP files of July 8, which are now in /proj/irtg/sempardata/mrp/ucca_dont_use/mrp/2019/training/ucca.

There is an older version (June 11) in /proj/irtg/sempardata/mrp/LDC2019E45/2019/training/ucca_old_dont_use, which is indeed different than the July versions.

@mariomgmn, could you double-check that the data in /proj/irtg/sempardata/mrp/LDC2019E45/2019 is actually the final version of the UCCA training data? And that we are not using parts of the outdated data in a training/dev split somewhere?

mariomgmn commented 4 years ago

the data in /proj/irtg/sempardata/mrp/LDC2019E45/2019 is the most up-to-date version of the data and the one we are currently using. Since the training/dev split comes from that data, there isn't really anywhere we could be using an older version.