Open tmpuch opened 10 months ago
Hi - thanks for raising this & sorry for the delayed response!
It's likely that the sample doesn't quite support LiLT out of the box yet, but the LiltForTokenClassification.forward()
interface is very similar to LayoutLM/etc so I hope it wouldn't be too difficult to add... And would certainly be interested to add it if I have time or anybody wants to raise a PR!
I'm not quite sure from the LiLT bbox
doc yet whether the "normalized" (x0, y0, x1, y1) coordinates would need to be handled differently than our current 1000-vocabulary handling for LayoutLM.
We mostly use AutoTokenizer/AutoModel/etc in the training script but there are some departures from this to deal with LayoutLMv2 vs LayoutXLM (which share a lot of tokenizer/processor logic but not quite everything)... So it might be there are some tweaks needed here to correctly handle LiLT as well.
...But as a first pass it's probably well worth just setting model_name_or_path
hyperparam to DLVCLab/lilt-roberta-en-base to see how close it is to working already?
I wanted to ask if this solution would currently support Language-Independent Layout Transformer - RoBERTa model (LiLT)?
If not, I wanted to request that the inference code be updated to support a LiLT model.