However, XLNet doesn't support having boxes in the input. I've been relying on this with LayoutLM and LiLT to automatically align my boxes with the tokenized inputs. It's a pain to do manually haha
File "/home/ysi.yardi.com/lm30640/projects/Invoice_OCR_Engine/venv/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2523, in __call__
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/ysi.yardi.com/lm30640/projects/Invoice_OCR_Engine/venv/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2626, in _call_one
**kwargs,
File "/home/ysi.yardi.com/lm30640/projects/Invoice_OCR_Engine/venv/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2817, in batch_encode_plus
**kwargs,
TypeError: _batch_encode_plus() got an unexpected keyword argument 'boxes'
It looks like the tokenizer extends XLNet
However, XLNet doesn't support having boxes in the input. I've been relying on this with LayoutLM and LiLT to automatically align my boxes with the tokenized inputs. It's a pain to do manually haha
Is there any way this could be supported?
sample input:
current error: