Closed gao-lex closed 2 years ago
Hello!
To compare with your work, I need the original PTB dataset [1] used in OpenIE6 model. But this data set can't be found on the Internet now. Can you provide one?
It seems that the authors' CA dataset can be found here: https://zenodo.org/record/4054476
However, I have failed to find any description of the labels (one may try to guess, but that's not the right way to do research :)) or the evaluation code to reproduce the reported result.
@SaiKeshav may I ask you to share any of that or to suggest where to look? Thank you.
Hi @alexeyev, the label set being used is 'CC', 'CP_START', 'CP', 'SEP', 'OTHERS' and 'NONE' (defined in line).
NONE stands for words that don't belong to any coordination structure. CC stands for conjunction coordination (and, but), CP stands for coordination phrase (Jeff Bezos, Amazon Company), CP_START stands for start of the entire coordination structure (which will also be start of the first coordination phrase), SEP stands for separators of different coordination phrases (comma) and OTHERS stands for tokens in the coordination structure that don't belong to any of the above categories.
You can look at https://aclanthology.org/I17-1027.pdf (Section 2.1, Task Description) for understanding each of the above phrases and look at this function link to see how the labels are parsed into the respective coordination structures.
Hi @SaiKeshav thank you for the clarification!
To compare with your work, I need the original PTB dataset [1] used in OpenIE6 model. But this data set can't be found on the Internet now. Can you provide one?
[1] Jessica Ficler, Yoav Goldberg: Coordination Annotation Extension in the Penn Tree Bank. ACL (1) 2016