NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.15k stars 1.42k forks source link

how to build a LiLT RobertaXML model with LayoutLMv3 tokenizer #444

Open MattBlue92 opened 3 months ago

MattBlue92 commented 3 months ago

As title I want to understand how we can create LiLT RobertaXML model with LayoutLMv3 tokenizer.

The version SCUT-DLVCLab/lilt-roberta-en-base uses a LayoutLMv3 tokenizer, but the version of SCUT-DLVCLab/lilt-infoxlm-base don't use a roberta tokenizer.

So I want to discovery how to do that, I've already training a lilt model for italian using the official code https://github.com/jpWang/LiLT, but I've used the roberta tokenizer (italian version) and I'm pretty sure that if I try to replace the roberta tokenizer with the LayoutLMv3 tokenizer, the code will broken.

Have anyone tried to do that?