Open deepanshudashora opened 2 years ago
Here's my attempt at a reason:
Unlike typical NER models that are trained on the entire text (limited to 512 tokens ofc) of the document, the LayoutLMV2 model forms context only based on the patch that you have annotated (most probably less than 512 tokens). The result is, that this works well for forms and other structured documents where there are rich visually distinctive features that can help the model identify your entities of interest. With unstructured prose, the minimal visually distinctive features and smaller contexts cause the model to not converge as per your expectations.
Maybe for your use case, a simple Token or Span classification model would do?
I have experienced it with two custom datasets where the information was in paragraph format and layoutlm models were not giving me good results on both since they were unstructured.
@NielsRogge Can you please specify the reason and tell me better way to train so I can get similar results like funds or cord on some unstructured datasets as well