Open siamakzd opened 2 years ago
Please refer to the code block below. https://github.com/clovaai/bros/blob/55c52d0872ed61fb7586b70618f45dcb0354f1b2/preprocess/funsd_spade/preprocess.py#L96-L116
Thank you!
For now I am interested in token classification task. To clarify, let's say for each document I have:
Which type of preprocessing should I do? For FUNSD I see there are two types funsd
and funsd_spade
.
I ran both preprocessing and see that parse
will be different in the processed files. I appreciate if you can tell me conceptually the reason for this difference.
Simply,
funsd
: for BIO-tagging decoderfunsd_spade
: for SPADE style decoderSince BIO-tagging approach is common, I recommend using this method first.
Thank you very much for sharing this great work! I was wondering if there are any instructions on how to prepare custom data to be used for fine-tuning Bros. I understand there are preprocessing codes for FUNSD, but if there are summarized instructions, it will be greatly helpful.