NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.49k stars 1.45k forks source link

How to get LayoutLMv2 output as key-value pairs? #39

Open avinashok opened 3 years ago

avinashok commented 3 years ago

Model I am using is LayoutLMv2:

(Link of the demo for reference: https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD )

I do get 'questions' & 'answers' as separate colored boxes in output image. But is there a way to get it as a python dictionary (key-value pairs), as in questions become keys & answers become its corresponding values?

NielsRogge commented 3 years ago

Hi,

This is definitely on my roadmap. The LayoutLMv2 authors defined another model called LayoutLMv2ForRelationExtraction, that does exactly that. However, they did not specify how to use the model at inference time, and I should look more into it in order to know how it works.

If you have the time to look into it, let me know, then we can add it to HuggingFace Transformers.

avinashok commented 3 years ago

Hi @NielsRogge,

Thanks for replying & glad to know it is already in your roadmap. I tried grouping the questions and answers based on pixel positioning of the layout boxes, but there is a bit of heuristics to it which is why I thought of reaching out directly to you.

What I tried:

## From the code: https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD/blob/386ad78844f905dfbb81072908c51ba344427587/app.py

encoding_feature_extractor = feature_extractor(image, return_tensors="pt")
words, boxes = encoding_feature_extractor.words, encoding_feature_extractor.boxes

####
""" The rest of the code comes here."""
####

layout_details = []
for prediction, box in zip(true_predictions, true_boxes):
    predicted_label = iob_to_label(prediction).lower()
    layout_details.append((predicted_label, prediction, box, label2color[predicted_label]))

#### Further, 

for i, j in zip(words[0], layout_details[1:-1]):
    print(i, j)  

This gives the corresponding tags, pixel positions, words for each layout blocks, which can be further grouped based on position of words. I used a threshold, in a way that, if the position of answer block is within the threshold range of question block, then associate them together as key-value pairs, so on & so forth.

Also, I was referring to line 139 https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_re.py based on the issue response for https://github.com/microsoft/unilm/issues/465 .

I'll definitely take a look at LayoutLMv2ForRelationExtraction

Isydmr commented 3 years ago

Hi @NielsRogge,

Thank you for your amazing work.

I added the LayoutLMv2ForRelationExtraction class to _modelinglayoutlmv2.py.

from transformers import LayoutLMv2ForRelationExtraction
model = LayoutLMv2ForRelationExtraction.from_pretrained("nielsr/layoutlmv2-finetuned-funsd")
model.to(device)

Here is the output:

Some weights of the model checkpoint at nielsr/layoutlmv2-finetuned-funsd were not used when initializing
 LayoutLMv2ForRelationExtraction: 
['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing LayoutLMv2ForRelationExtraction from 
the checkpoint of a model trained on another task or with another architecture (e.g. initializing 
a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LayoutLMv2ForRelationExtraction from the checkpoint 
- of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model
-  from a BertForSequenceClassification model).
Some weights of LayoutLMv2ForRelationExtraction were not initialized from the model checkpoint at 
nielsr/layoutlmv2-finetuned-funsd and are newly initialized: 
['extractor.rel_classifier.linear.bias', extractor.rel_classifier.linear.weight', 'extractor.ffnn_head.3.bias', 
'extractor.ffnn_tail.0.bias', 'extractor.ffnn_head.0.weight', 'extractor.ffnn_head.0.bias', 
'extractor.rel_classifier.bilinear.weight', 'extractor.ffnn_tail.3.weight', 'extractor.ffnn_tail.0.weight', 
'extractor.entity_emb.weight', 'extractor.ffnn_tail.3.bias', 'extractor.ffnn_head.3.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

What would be the guidance on the next step? Is the pre-trained model only contains the Semantic Entity Recognition part?

https://github.com/microsoft/unilm/issues/429, https://github.com/microsoft/unilm/issues/465 are related. Unfortunately, https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_re.py does not contain
if training_args.do_predict: block at the end.

abdksyed commented 3 years ago

Any update guys, using LayoutXLM separately just for linking would not make sense, and for Semantic Entity Recognition, LayoutLMv2 looks better than XLM when looking at numbers on FUNSD dataset.

We can maybe make up a plan and work out to get the LayoutLMv2 to do the Relation Extraction. So we can have LMv2 itself for both SER and RE.

WenmuZhou commented 3 years ago

mark

fadi212 commented 2 years ago

Hi @Isydmr , @avinashok Can you please share the inference pipeline for RelationExtraction model ?

and is there any way that we can convert the results of LayoutLMv2 in a key-value format ?

mattdeeperinsights commented 2 years ago

@fadi212 @abdksyed @avinashok In the above thread someone has suggested a solution with a working colab example that you can use.

They are also fixing up and adding this class in a separate pull request for those that want to wait for a proper release.

anamtaamin commented 2 years ago

ft/examples/run_xfun_re.py based on the issue response for microsoft/unilm#465 .

Hi, were you able to find key value pairs?

jyotiyadav94 commented 2 years ago

Hi

is there any update on this? getting the output as key-value pairs? @avinashok can you please share the complete code of the solution you mentioned above ?

avinashok commented 2 years ago

Hi

is there any update on this? getting the output as key-value pairs? @avinashok can you please share the complete code of the solution you mentioned above ?

@jyotiyadav94 I can see @R0bk already mentioning the solution in one of the above comments with a colab notebook version of it.

aditya11ad commented 2 years ago

Hi,

This is definitely on my roadmap. The LayoutLMv2 authors defined another model called LayoutLMv2ForRelationExtraction, that does exactly that. However, they did not specify how to use the model at inference time, and I should look more into it in order to know how it works.

If you have the time to look into it, let me know, then we can add it to HuggingFace Transformers.

hi , how can we get key- value extraction , like : {'invoice number': '123456', 'date':'23/04/2022', 'amount':'44987', ....}

yellowjs0304 commented 2 years ago

@jyotiyadav94 @aditya11ad Did you get solution? I saw the colab(from @avinashok), but i can't find the way to extract Key-Value pairs.

jyotiyadav94 commented 2 years ago

Hi @yellowjs0304 I basically used this approach https://medium.com/mlearning-ai/ai-in-the-real-world-form-processing-c96912d80ef2 to get the key value pairs.

yellowjs0304 commented 2 years ago

@jyotiyadav94 Thank you for sharing idea. Is this library also available in another OCR?(Not tesseract OCR, I got seperated OCR results)

jyotiyadav94 commented 2 years ago

@yellowjs0304 can you provide me with your Gmail id I will share the complete link of the code for this?

yellowjs0304 commented 2 years ago

@jyotiyadav94 Sure, the contact mail is at the top of my profile readme. Thank you :)

NurielWainstein commented 2 years ago

@jyotiyadav94 You said you'll going to share the complete code, Where can I find it?

thanks!

yellowjs0304 commented 2 years ago

@nurielw05 I saw really late. jyoti shared this link which is related with above post.

NurielWainstein commented 2 years ago

@jyotiyadav94 this only work if the value is on the right side of the key, what if the value is under the key?

like this:

total: name: 2323 nuriel

hjerbii commented 2 years ago

Hello!

I'm having the same issue as all of you. Actually in this notebook .

In the inference part, it is not clear how we can build the entities list (define tails and heads) in the case where our input is an image and we extract entities using LayoutLMv2TokenClassification (for example). Tails and heads are not given by the model. Could you please update the notebook with an inference example that uses only an image as input? Many thanks in advance!

NielsRogge commented 2 years ago

Tails and heads are not given by the model

=> Tails are questions, and answers are heads (or vice versa). So LayoutLMForTokenClassification does provide you that.

hjerbii commented 2 years ago

image But how do we get the ids where the entities start/end?

Gladiator07 commented 1 year ago

Hi @avinashok, can you share the code of the heruistical approach you used to groupby the questions and answers ?

Hi @NielsRogge , been following your work for quite a while and learned a lot, great work! Do we have any progress on this part ? (i.e. getting the output as {"key" : "value"}). I am struggling to post-process the predictions in this form (have tried a bunch of ways but fails in one or the other scenarios). Any help or resources on this part would help a lot :)

munish0838 commented 1 year ago

Any update guys, using LayoutXLM separately just for linking would not make sense, and for Semantic Entity Recognition, LayoutLMv2 looks better than XLM when looking at numbers on FUNSD dataset.

We can maybe make up a plan and work out to get the LayoutLMv2 to do the Relation Extraction. So we can have LMv2 itself for both SER and RE.

Hi, were you able to use LayoutLMv2 for Relation Extraction task on FUNSD dataset? Please share the relevant code/ methods to convert dataset and process it