I am working on a practical use-case of Document understanding and wondering if I could leverage models such as StructuralLM. The goal is to extract key informations from the document (in fields or tables). The trick is that I only have a few training samples (<50) and I don't think VQA would apply as these informations are very specific and not always associated with a clear question.
Here are the 2 options I have in mind :
finetuning model. But would 50 sample be enough ? How should I deal with tables ? (which don't really look like tables but rather a list without printed rows and columns, as on many receipts)
leverage a foundation model to perform few shot learning (as in GPT3). Are there text + layout foundation models out there that would work for this ? Or should I do prompt engineering with GPT3, Flan-T5, OPT or equivalent models ?
I am interested to get your insights for both english-data... and non english (but latin) data,
Hello,
I am working on a practical use-case of Document understanding and wondering if I could leverage models such as StructuralLM. The goal is to extract key informations from the document (in fields or tables). The trick is that I only have a few training samples (<50) and I don't think VQA would apply as these informations are very specific and not always associated with a clear question.
Here are the 2 options I have in mind :
Many thanks for your inputs, Simon