doc-analysis / XFUND

XFUND: A Multilingual Form Understanding Benchmark
https://arxiv.org/abs/2104.08836
182 stars 19 forks source link

Is the annotation rule same as the FUNSD? #1

Open tengerye opened 3 years ago

tengerye commented 3 years ago

Hi, may I ask if the annotation rule is the same as FUNSD? There are a number of strange examples.

Isydmr commented 2 years ago

Thank you for sharing the dataset @ranpox @wolfshow.

I agree with @tengerye.

I'm trying to annotate my custom dataset to train with XFUND.

I made visualizations on the existing datasets (XFUND and FUNDS) to properly understand annotation rules. In some cases, I encountered confusing labeling. Here are some remarks:

Orange, blue, green, and pink boxes correspond to the header, question, answer, and other respectively. Linking array displayed on top of boxes.

a) Question or Header?

FUNSD

image

XFUND

image

Headers do not contain linking arrays in XFUND whereas FUNSD contains a lot of header with linking array that points to questions.


b) Unanswered Questions

FUNSD

image

XFUND

image

XFUND docs labeled such as unanswered questions tagged as Other (see 'Téléphone 2')

c) Labeling Inconsistencies

XFUND - 1

image

XFUND - 2

image

IDENTITE is tagged as question but Identité du candidat is tagged as other in different images from XFUND.

wolfshow commented 2 years ago

@Isydmr We are working on XFUND v1.1. Please stay tuned.

Isydmr commented 2 years ago

What is the timeline for XFUND v1.1? We have a tagging project and we were planning to rely on XFUND v1.0 document annotation rules. Should we wait for XFUND v1.1?

hasansalimkanmaz commented 2 years ago

+1

Rijgersberg commented 2 years ago

Any idea on the timeline for XFUND v1.1? I'm considering annotating a Dutch dataset in XFUND style.