NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
8.51k stars 1.34k forks source link

LayoutLM model only able to classify individual words instead of entire sections #263

Open keval2415 opened 1 year ago

keval2415 commented 1 year ago

Model I am using (LayoutLM ...):

Here, I would like to develop a custom resume parser model that can accurately predict the sections for EDUCATION, SKILLS, and EXPERIENCE based on the resume. I have fine-tuned the LayoutLMv3 model on a custom dataset that is similar to the FUNSD dataset.

Although the LayoutLM model can predict education keywords, it only does so at the word level. For instance, if the resume states "My education is in computer engineering from LD College Ahmedabad," the model will label "computer" and "engineering" as EDUCATION. However, I aim to have all classified words in a single section rather than in individual word sections.

Therefore, here are some random screenshots of the LayoutLM model output. Screenshot from 2023-03-13 18-36-47

And here, I would like the output to include box coordinates for the EDUCATION section as well as the SKILLS section, identified by their respective keywords. Screenshot from 2023-03-13 18-32-05

Note: I have attempted to use the Layout Parser model with the PublayNet dataset. However, this model was unable to accurately predict and classify the sections for EDUCATION, SKILLS, EXPERIENCE, etc.

If there are any other models that would be suitable for my use case, please kindly suggest them. Thank you all for your help.