NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.49k stars 1.45k forks source link

Why layoutlm models does not work properly on unstructured text images and is there any way to do it properly ? #136

Open deepanshudashora opened 2 years ago

deepanshudashora commented 2 years ago

I have experienced it with two custom datasets where the information was in paragraph format and layoutlm models were not giving me good results on both since they were unstructured.

@NielsRogge Can you please specify the reason and tell me better way to train so I can get similar results like funds or cord on some unstructured datasets as well

nasheedyasin commented 2 years ago

Here's my attempt at a reason:

Unlike typical NER models that are trained on the entire text (limited to 512 tokens ofc) of the document, the LayoutLMV2 model forms context only based on the patch that you have annotated (most probably less than 512 tokens). The result is, that this works well for forms and other structured documents where there are rich visually distinctive features that can help the model identify your entities of interest. With unstructured prose, the minimal visually distinctive features and smaller contexts cause the model to not converge as per your expectations.

Maybe for your use case, a simple Token or Span classification model would do?