A common feature of LLMv2+ is that visual page image features are expected even for fine-tuning tasks - requiring some significant changes from this original sample.
Description of changes:
Upgrade the sample to support LayoutLMv2 (for generally improved accuracy) and LayoutXLM (for multi-lingual use-cases).
Status and outstanding items:
[X] Batch and online thumbnail image generation and integration
[X] LLMv2 and LXLM can be fine-tuned and deployed into the pipeline
[X] Retain full LLMv1 support
[X] LLMv1 can be pre-trained, fine-tuned, deployed and used in some configuration
[X] (Bug Fixed) LLMv1 can be trained without setting dataloader_num_workers=0 hyperparam
[X] LLMv2 and XLM support some level of pre-training
[X] (Bug) LLMv2+ supports multi-GPU training (717e036 - current native PyTorch config, SMDDP untested)
[X] (Bug) Tokenizer padding and truncation settings are applied correctly - Tentatively fixed as of 717e036
[X] Notebook and doc updates to make v2/XLM the default
Testing done:
Under active development so expect bugs - but feedback in the thread welcome!
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available: #6
Since the original LayoutLM paper, there have been many interesting developments in multi-modal document AI: Notably LayoutLMv2, multi-lingual LayoutXLM, LayoutLMv3, and Amazon's own DocFormer!
A common feature of LLMv2+ is that visual page image features are expected even for fine-tuning tasks - requiring some significant changes from this original sample.
Description of changes:
Upgrade the sample to support LayoutLMv2 (for generally improved accuracy) and LayoutXLM (for multi-lingual use-cases).
Status and outstanding items:
dataloader_num_workers=0
hyperparamTesting done:
Under active development so expect bugs - but feedback in the thread welcome!
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.