[Enhancement] From-scratch model pre-training

This sample currently demonstrates:

Fine-tuning existing models for downstream tasks (NER), and
Continuation pre-training with unlabelled data from an existing model checkpoint.

From-scratch pre-training is considerably more resource-intensive. For example the LayoutXLM paper describes using 64 V100 GPUs (i.e. 8x p3.16xlarge or p3dn.24xlarge instances for several hours) over ~30M documents.

However, some users may still be interested in from-scratch pre-training - especially for low-resource languages or specialised domains - if tested example code was available. Please drop a 👍 or a comment if this is an enhancement that you'd find useful!

aws-samples / amazon-textract-transformer-pipeline

[Enhancement] From-scratch model pre-training #19