Closed yellowjs0304 closed 2 years ago
Hi @yellowjs0304 in the processor mixin there is an option for return_overflowing_tokens
it is by default False. It will return more than 1 sequence pair according to the max_length
specified for the processor.
https://huggingface.co/docs/transformers/internal/tokenization_utils
https://huggingface.co/course/chapter6/3b?fw=pt
Also refer to above links for more info. Hope this helps
Thank you. I'll check it.
@NielsRogge
Hi, I have some question, I used LayoutXLM and finetuning with my own data. I have question, I know the pre-trained model is training with max-encoding length = 512. But what should i do If I need to inference long images with this model. I heard the original repo(LayoutXLM) is solving this issue with truncating the input data with two parts. (if total length is 617, truncate it with two inputs, like 512 + 102 sequence input datas)
Is there any option like these in Huggingface? or this layoutxlm processor?? I look forward to receiving any ideas. Thank you :)