NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.15k stars 1.42k forks source link

How to handle multiple page document with LayoutLM models ? #218

Open hrishikeshpatel opened 1 year ago

hrishikeshpatel commented 1 year ago

How to handle multiple page document with LayoutLM models ?

I tried to play with bounding boxes but not much helpful.

Here is the list of experiments that are not giving good results. 1) Train the model on one page documents and use it for multi page inference. (no change in bounding box normalisation logic) 2) Train the model on one page + multi page documents and use it for multi page inference.(no change in bounding box normalisation logic) 3) readjust the bonding box on 2nd page after adding height of the first page. Same goes for other pages. Idea is to add the previous pages height and normalise the boxes accordingly. 4) Combine all the pages into one vertically stretched image and then run the pipeline considering it as one page

Anyone can throw some light on any better way to handle this case.

seppestaes commented 1 year ago

Nothing to add unfortunately, interesting case though, thx for the results. Did you find any performance differences between downstream tasks?

e.g. finding Q&A pairs (form that spans multiple pages) e.g. finding the total of an invoice on the last page of a 5-page invoice e.g. extract table that spans across multiple pages