aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

[Enhancement] Add visual embeddings (for LayoutLMv2 / LayoutXLM) #6

Closed athewsey closed 1 year ago

athewsey commented 2 years ago

More recent successors to the LayoutLM model used in this sample (e.g. LayoutLMv2 and LayoutXLM) make more extensive use of visual embeddings of the page image to boost performance. To get the most out of a possible model architecture upgrade, this sample should probably aim to integrate page image pixel analysis.

Tentative items/components: