Open avinashok opened 3 years ago
Hi,
This is definitely on my roadmap. The LayoutLMv2 authors defined another model called LayoutLMv2ForRelationExtraction
, that does exactly that. However, they did not specify how to use the model at inference time, and I should look more into it in order to know how it works.
If you have the time to look into it, let me know, then we can add it to HuggingFace Transformers.
Hi @NielsRogge,
Thanks for replying & glad to know it is already in your roadmap. I tried grouping the questions and answers based on pixel positioning of the layout boxes, but there is a bit of heuristics to it which is why I thought of reaching out directly to you.
What I tried:
## From the code: https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD/blob/386ad78844f905dfbb81072908c51ba344427587/app.py
encoding_feature_extractor = feature_extractor(image, return_tensors="pt")
words, boxes = encoding_feature_extractor.words, encoding_feature_extractor.boxes
####
""" The rest of the code comes here."""
####
layout_details = []
for prediction, box in zip(true_predictions, true_boxes):
predicted_label = iob_to_label(prediction).lower()
layout_details.append((predicted_label, prediction, box, label2color[predicted_label]))
#### Further,
for i, j in zip(words[0], layout_details[1:-1]):
print(i, j)
This gives the corresponding tags, pixel positions, words for each layout blocks, which can be further grouped based on position of words. I used a threshold, in a way that, if the position of answer block is within the threshold range of question block, then associate them together as key-value pairs, so on & so forth.
Also, I was referring to line 139 https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_re.py based on the issue response for https://github.com/microsoft/unilm/issues/465 .
I'll definitely take a look at LayoutLMv2ForRelationExtraction
Hi @NielsRogge,
Thank you for your amazing work.
I added the LayoutLMv2ForRelationExtraction class to _modelinglayoutlmv2.py.
from transformers import LayoutLMv2ForRelationExtraction
model = LayoutLMv2ForRelationExtraction.from_pretrained("nielsr/layoutlmv2-finetuned-funsd")
model.to(device)
Here is the output:
Some weights of the model checkpoint at nielsr/layoutlmv2-finetuned-funsd were not used when initializing
LayoutLMv2ForRelationExtraction:
['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing LayoutLMv2ForRelationExtraction from
the checkpoint of a model trained on another task or with another architecture (e.g. initializing
a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LayoutLMv2ForRelationExtraction from the checkpoint
- of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model
- from a BertForSequenceClassification model).
Some weights of LayoutLMv2ForRelationExtraction were not initialized from the model checkpoint at
nielsr/layoutlmv2-finetuned-funsd and are newly initialized:
['extractor.rel_classifier.linear.bias', extractor.rel_classifier.linear.weight', 'extractor.ffnn_head.3.bias',
'extractor.ffnn_tail.0.bias', 'extractor.ffnn_head.0.weight', 'extractor.ffnn_head.0.bias',
'extractor.rel_classifier.bilinear.weight', 'extractor.ffnn_tail.3.weight', 'extractor.ffnn_tail.0.weight',
'extractor.entity_emb.weight', 'extractor.ffnn_tail.3.bias', 'extractor.ffnn_head.3.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
What would be the guidance on the next step? Is the pre-trained model only contains the Semantic Entity Recognition part?
https://github.com/microsoft/unilm/issues/429, https://github.com/microsoft/unilm/issues/465 are related.
Unfortunately, https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_re.py does not contain
if training_args.do_predict:
block at the end.
Any update guys, using LayoutXLM separately just for linking would not make sense, and for Semantic Entity Recognition
, LayoutLMv2 looks better than XLM when looking at numbers on FUNSD dataset.
We can maybe make up a plan and work out to get the LayoutLMv2 to do the Relation Extraction
. So we can have LMv2 itself for both SER
and RE
.
mark
Hi @Isydmr , @avinashok Can you please share the inference pipeline for RelationExtraction model ?
and is there any way that we can convert the results of LayoutLMv2 in a key-value format ?
@fadi212 @abdksyed @avinashok In the above thread someone has suggested a solution with a working colab example that you can use.
They are also fixing up and adding this class in a separate pull request for those that want to wait for a proper release.
ft/examples/run_xfun_re.py based on the issue response for microsoft/unilm#465 .
Hi, were you able to find key value pairs?
Hi
is there any update on this? getting the output as key-value pairs? @avinashok can you please share the complete code of the solution you mentioned above ?
Hi
is there any update on this? getting the output as key-value pairs? @avinashok can you please share the complete code of the solution you mentioned above ?
@jyotiyadav94 I can see @R0bk already mentioning the solution in one of the above comments with a colab notebook version of it.
Hi,
This is definitely on my roadmap. The LayoutLMv2 authors defined another model called
LayoutLMv2ForRelationExtraction
, that does exactly that. However, they did not specify how to use the model at inference time, and I should look more into it in order to know how it works.If you have the time to look into it, let me know, then we can add it to HuggingFace Transformers.
hi , how can we get key- value extraction , like : {'invoice number': '123456', 'date':'23/04/2022', 'amount':'44987', ....}
@jyotiyadav94 @aditya11ad Did you get solution? I saw the colab(from @avinashok), but i can't find the way to extract Key-Value pairs.
Hi @yellowjs0304 I basically used this approach https://medium.com/mlearning-ai/ai-in-the-real-world-form-processing-c96912d80ef2 to get the key value pairs.
@jyotiyadav94 Thank you for sharing idea. Is this library also available in another OCR?(Not tesseract OCR, I got seperated OCR results)
@yellowjs0304 can you provide me with your Gmail id I will share the complete link of the code for this?
@jyotiyadav94 Sure, the contact mail is at the top of my profile readme. Thank you :)
@jyotiyadav94 You said you'll going to share the complete code, Where can I find it?
thanks!
@nurielw05 I saw really late. jyoti shared this link which is related with above post.
@jyotiyadav94 this only work if the value is on the right side of the key, what if the value is under the key?
like this:
total: name: 2323 nuriel
Hello!
I'm having the same issue as all of you. Actually in this notebook .
In the inference part, it is not clear how we can build the entities list (define tails and heads) in the case where our input is an image and we extract entities using LayoutLMv2TokenClassification (for example). Tails and heads are not given by the model. Could you please update the notebook with an inference example that uses only an image as input? Many thanks in advance!
Tails and heads are not given by the model
=> Tails are questions, and answers are heads (or vice versa). So LayoutLMForTokenClassification
does provide you that.
But how do we get the ids where the entities start/end?
Hi @avinashok, can you share the code of the heruistical approach you used to groupby the questions and answers ?
Hi @NielsRogge , been following your work for quite a while and learned a lot, great work! Do we have any progress on this part ? (i.e. getting the output as {"key" : "value"}). I am struggling to post-process the predictions in this form (have tried a bunch of ways but fails in one or the other scenarios). Any help or resources on this part would help a lot :)
Any update guys, using LayoutXLM separately just for linking would not make sense, and for
Semantic Entity Recognition
, LayoutLMv2 looks better than XLM when looking at numbers on FUNSD dataset.We can maybe make up a plan and work out to get the LayoutLMv2 to do the
Relation Extraction
. So we can have LMv2 itself for bothSER
andRE
.
Hi, were you able to use LayoutLMv2 for Relation Extraction task on FUNSD dataset? Please share the relevant code/ methods to convert dataset and process it
Model I am using is LayoutLMv2:
(Link of the demo for reference: https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD )
I do get 'questions' & 'answers' as separate colored boxes in output image. But is there a way to get it as a python dictionary (key-value pairs), as in questions become keys & answers become its corresponding values?