google-research-datasets / vrdu

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including diverse data types, complex templates, and diversity of layouts within a single document type.
74 stars 5 forks source link

Lv1 has mixed training templates #1

Open amitbcp opened 1 year ago

amitbcp commented 1 year ago

Hi Team

Thanks for open-sourcing the annotations and split for the paper. On parsing the few-shot splits, the training folders for lv1 can be seen to have same templates for both the forms.

For example FARA-lv1-single_Amendment-train_10-test_300-valid_100-SD_0.json has approximately 3 templates.

Can you please verify or explicitly explain the definition of templates ?

When we refer to Task One (STL) are we not expecting documents like shown in Figure 4.b ?

zlwang-cs commented 1 year ago

Hi @amitbcp,

Thanks for your interests in our work!

By templates, we mean that documents in the same or similar layout structures. The relative spatial relation on the page should be similar. Figure 4.b is a good example.

As you can tell from the document names, we group the documents according to the form types, e.g. amendment, dissemination report, and short form. Since documents in the same group are the same form, we believe they contain the same contents in a similar structure. We can also see a few variants in the same group (as you pointed out), but we believe such minor difference will not influence the final results greatly.

Thanks!