Closed BrownXing closed 3 months ago
@BrownXing Thanks for your attention in our work. As described in Sec. 4.2 in our paper, we design QA pair only in DocGenome-test, which will be released on huggingface within the next two days.
Thank you very much! Additionally, I have not located the annotations about the Logical Relationships between the component units. Would you please help me to find the file ?
@BrownXing
Please refere to order_annotations.json
for Logical Relationships annotations. Usually, there are two fields in the order_annotations.json
, annotations
and orders
:
orders
is a list containing several triples, each triple representing a relationship type
between two bounding boxes with id from
and to
. annotations
is also a list containing necessary information about the bounding box.
(example comes from "astro-ph.CO/0911.2655")
Due to time budget, the number of Logical Relationships annotations in training dataset are less than in test dataset. Specifically, "implicit-cite" relation doesn't show up in training dataset; "explicit-cite" doesn't contain cross reference between texts and float environments such as tables and figures.
Besides, since this is a large project, some annotations may be lost in early stage. For example, some order_annotations.json
may not contain "annotations"...
Feel free to ask any further questions!
Thank you very much, I have found it. Once again appreciating your excellent work!
@BrownXing Thanks for your attention in our work. As described in Sec. 4.2 in our paper, we design QA pair only in DocGenome-test, which will be released on huggingface within the next two days.
Did you release the test split? I'd like to reference it as one of benchmarks :)
@BrownXing Thanks for your attention in our work. As described in Sec. 4.2 in our paper, we design QA pair only in DocGenome-test, which will be released on huggingface within the next two days.
Did you release the test split? I'd like to reference it as one of benchmarks :)
Thank you for your interest and we expect to release it within the week.
@BrownXing Thanks for your attention in our work. As described in Sec. 4.2 in our paper, we design QA pair only in DocGenome-test, which will be released on huggingface within the next two days.
Did you release the test split? I'd like to reference it as one of benchmarks :)
Hello, we have released testset, you can download it here
Thank you for sharing this awesome work! I have downloaded the dataset (training) from the google drive provided in the repo, while I met some problem when trying to find the label of QA Tasks and Logical Relationsips. Could you please indicate where to find the test-set containing QA annotations? Besides, i failed to find the Logical Relationsips in the training set. The data file of a single docuemnt is organized as : file_name |-layout_annotations.json |-order_annotations.json |-page_xxx.jpg |-quality_report.json |-reading_annotations.json The values of 'previous_block', 'parent_block' and 'next_block' in the order_annotations.json is null. Did I overlook any key points to parse the dataset?