nn-Nikita commented 1 year ago

I have used the following code for inference after fine-tuning the LayoutLMv3 model on the FUNSD dataset and obtained the predicted labels, but now I want to know how to associate these labels with the corresponding text in the image and extract the text along with their respective labels.

from PIL import Image import warnings, os, sys from pathlib import Path import torch import pytesseract from transformers import AutoModelForTokenClassification, AutoProcessor

warnings.filterwarnings('ignore')

base_dir = Path(file).resolve().parent.parent src_path = os.path.join(os.path.dirname(base_dir), 'Document_Detection/src') sys.path.append(src_path)

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Loading trained model

model = AutoModelForTokenClassification.from_pretrained(r"Document_Detection\models\checkpoint-1000") processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=True)

val_image_path = r"C:\Users\admin\Desktop\repositories\True_Vision\Document_Detection\data\image1.png" image = Image.open(val_image_path) image = image.convert("RGB")

encoding = processor(image, return_tensors="pt") for k,v in encoding.items(): print(k, v.shape)

with torch.no_grad(): outputs = model(**encoding)

logits = outputs.logits print(f'logits_shape::{logits.shape}')

predictions = logits.argmax(-1).squeeze().tolist() print(f"predictions::{predictions}")

Directing using this dictionary for mapping the predictions and labels

id2label = { "0": "O", "1": "B-HEADER", "2": "I-HEADER", "3": "B-QUESTION", "4": "I-QUESTION", "5": "B-ANSWER", "6": "I-ANSWER" }

predicted_labels = [id2label[str(pred)] for pred in predictions] print(f"predicted_labels: {predicted_labels}")

tzktz commented 5 months ago

@nn-Nikita can u update the code here, how u fine tune the layoutlmv3 on funsd and save the model ? thx in advance :)

Monta79 commented 3 months ago

hi, did you find a solution please?

tzktz commented 3 months ago

hi, did you find a solution please?

Nope :)

NielsRogge commented 3 months ago

Refer to my notebook here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb.

Monta79 commented 3 months ago

@NielsRogge I trust you're doing well. I've been working on fine-tuning LayoutLMv3 with ocr=False. However, I've encountered an issue with new, unlabeled images not in the test set. Through my research, I discovered that using the result of Tesseract and then passing it to LayoutLMv3 might be a solution. Would you be available to review my work and provide guidance on this matter?

NielsRogge / Transformers-Tutorials

Inference on fine tuned LayoutLMv3 model #324

Loading trained model

Directing using this dictionary for mapping the predictions and labels