LayoutXLM / LayoutLMv2 upgrade

Issue #, if available: #6

Since the original LayoutLM paper, there have been many interesting developments in multi-modal document AI: Notably LayoutLMv2, multi-lingual LayoutXLM, LayoutLMv3, and Amazon's own DocFormer!

A common feature of LLMv2+ is that visual page image features are expected even for fine-tuning tasks - requiring some significant changes from this original sample.

Description of changes:

Upgrade the sample to support LayoutLMv2 (for generally improved accuracy) and LayoutXLM (for multi-lingual use-cases).

Status and outstanding items:

[X] Batch and online thumbnail image generation and integration
[X] LLMv2 and LXLM can be fine-tuned and deployed into the pipeline
[X] Retain full LLMv1 support
- [X] LLMv1 can be pre-trained, fine-tuned, deployed and used in some configuration
- [X] (Bug Fixed) LLMv1 can be trained without setting dataloader_num_workers=0 hyperparam
[X] LLMv2 and XLM support some level of pre-training
[X] (Bug) LLMv2+ supports multi-GPU training (717e036 - current native PyTorch config, SMDDP untested)
[X] (Bug) Tokenizer padding and truncation settings are applied correctly - Tentatively fixed as of 717e036
[X] Notebook and doc updates to make v2/XLM the default

Testing done:

Under active development so expect bugs - but feedback in the thread welcome!

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aws-samples / amazon-textract-transformer-pipeline

LayoutXLM / LayoutLMv2 upgrade #16