Closed agademic closed 3 years ago
Thanks for interest in our work!
We followed the procedure described in Section 4.1 of Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts. Basically, we solved a vanilla NER task, followed by normalization and aggregation of multiple occurrences of a given entity in the document. For instance, dates were normalized to the yyyy-mm-dd
format, like in the expected.tsv
file.
Great! Thanks a lot for the hints!
First of all: Thank you for open sourcing your model and congrats on the impressive results in the various datasets!
I am attempting to reproduce your results on the SROIE (and kleister) dataset and I am wondering how you pre-processed the source/target data. Did you formulate the learning task as a vanilla NER-task (i.e. BIO annotate the data) or did you formulate it as a question-answering task (or something else)?
In your kleister-charity dataset, you provide the expected.tsv file with entries such as: income_annually_in_british_pounds=10348000.00 report_date=2016-03-31. I am not sure though if that is also your final target format.
Any help is appreciated!