aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
92 stars 26 forks source link

LayoutXLM / LayoutLMv2 upgrade #16

Closed athewsey closed 2 years ago

athewsey commented 2 years ago

Issue #, if available: #6

Since the original LayoutLM paper, there have been many interesting developments in multi-modal document AI: Notably LayoutLMv2, multi-lingual LayoutXLM, LayoutLMv3, and Amazon's own DocFormer!

A common feature of LLMv2+ is that visual page image features are expected even for fine-tuning tasks - requiring some significant changes from this original sample.

Description of changes:

Upgrade the sample to support LayoutLMv2 (for generally improved accuracy) and LayoutXLM (for multi-lingual use-cases).

Status and outstanding items:

Testing done:

Under active development so expect bugs - but feedback in the thread welcome!


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

athewsey commented 2 years ago

This change:

Therefore now merging 🥳