We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including diverse data types, complex templates, and diversity of layouts within a single document type.
"annotations": [["registration_num", [["3712\n", [0, 0.46376812, 0.32893434, 0.5, 0.3447707], [[2380, 2385]]]]]
can you clarify the format of the above annotations? I can confirm that:
"3712\n" is the value of registration_num.
[0.46376812, 0.32893434, 0.5, 0.3447707] corresponds to [x_min, y_min, x_max, y_max]
what do the numbers in bold mean?
0 before 0.46376812?
and [2380, 2385]?
Thank you so much.
"annotations": [["registration_num", [["3712\n", [0, 0.46376812, 0.32893434, 0.5, 0.3447707], [[2380, 2385]]]]] can you clarify the format of the above annotations? I can confirm that: "3712\n" is the value of registration_num. [0.46376812, 0.32893434, 0.5, 0.3447707] corresponds to [x_min, y_min, x_max, y_max] what do the numbers in bold mean? 0 before 0.46376812? and [2380, 2385]? Thank you so much.