aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

LILT #40

Open tmpuch opened 10 months ago

tmpuch commented 10 months ago

I wanted to ask if this solution would currently support Language-Independent Layout Transformer - RoBERTa model (LiLT)?

If not, I wanted to request that the inference code be updated to support a LiLT model.

athewsey commented 9 months ago

Hi - thanks for raising this & sorry for the delayed response!

It's likely that the sample doesn't quite support LiLT out of the box yet, but the LiltForTokenClassification.forward() interface is very similar to LayoutLM/etc so I hope it wouldn't be too difficult to add... And would certainly be interested to add it if I have time or anybody wants to raise a PR!

I'm not quite sure from the LiLT bbox doc yet whether the "normalized" (x0, y0, x1, y1) coordinates would need to be handled differently than our current 1000-vocabulary handling for LayoutLM.

We mostly use AutoTokenizer/AutoModel/etc in the training script but there are some departures from this to deal with LayoutLMv2 vs LayoutXLM (which share a lot of tokenizer/processor logic but not quite everything)... So it might be there are some tweaks needed here to correctly handle LiLT as well.

...But as a first pass it's probably well worth just setting model_name_or_path hyperparam to DLVCLab/lilt-roberta-en-base to see how close it is to working already?