Fine-tuning existing models for downstream tasks (NER), and
Continuation pre-training with unlabelled data from an existing model checkpoint.
From-scratch pre-training is considerably more resource-intensive. For example the LayoutXLM paper describes using 64 V100 GPUs (i.e. 8x p3.16xlarge or p3dn.24xlarge instances for several hours) over ~30M documents.
However, some users may still be interested in from-scratch pre-training - especially for low-resource languages or specialised domains - if tested example code was available. Please drop a 👍 or a comment if this is an enhancement that you'd find useful!
This sample currently demonstrates:
From-scratch pre-training is considerably more resource-intensive. For example the LayoutXLM paper describes using 64 V100 GPUs (i.e. 8x
p3.16xlarge
orp3dn.24xlarge
instances for several hours) over ~30M documents.However, some users may still be interested in from-scratch pre-training - especially for low-resource languages or specialised domains - if tested example code was available. Please drop a 👍 or a comment if this is an enhancement that you'd find useful!