ZeningLin / PEneo

[MM'2024] Official implementation of "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction."
15 stars 2 forks source link

Inference #1

Closed Muhammad-Hamza-Jadoon closed 2 weeks ago

Muhammad-Hamza-Jadoon commented 1 month ago

hi, is there any documentation explaining how to use this for inference on docs (seen and unseen).

regards

ZeningLin commented 1 month ago

Hello,

Thanks for your advice. The inference code is not implemented yet, I will add it later this week.

Muhammad-Hamza-Jadoon commented 1 month ago

does this code specifically extract key value pairs from image docs? or does it juts do the token classfifcation that rest of layoutlm models do.

Also do you have any good sources that can guide me to carry out my task. Your PEneo repo is very complicated to implement for me right now.

Regards

ZeningLin commented 1 month ago

PEneo is designed for the key-value pair extraction task, which specifically extracts the key-value pairs from the document.

PEneo's code is built based on the huggingface train/eval pipeline of LayoutLM series model. You may first refer to the microsoft/unilm repository to get familiar with the basic pipeline, including the data loader, data collator, trainer, and evaluation metrics. Usage of the trainer and training args can be found in Huggingface's documents (https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments).

Modification of this repo mainly lies in the switchable backbone and the downstream PEneo decoder, which is controlled by the config file of the pre-trained weights. You should first run the pre-trained utils generation scripts by following the instructions in README.md and delving deep into the generated config.json to see how it affects PEneo's structure.

I will add a more detailed document on how to use the code later.

Best regards

Muhammad-Hamza-Jadoon commented 1 month ago

Thanks for your help.

Looking forward to the inference and documentation of your code.

Regards

ZeningLin commented 1 month ago

The inference code is now available at deploy/inference.py. You may run the following commands to make an inference:

python deploy/inference.py \
    --model_name_or_path PATH_TO_YOUR_TRAINED_WEIGHTS \
    --dir_image PATH_TO_YOUR_IMAGE \
    --dir_ocr PATH_TO_YOUR_OCR_RESULT_OF_THE_IMAGE

If you do not have the OCR results, you may use the built-in Tesseract OCR pipeline of transformers:

python deploy/inference.py \
    --model_name_or_path PATH_TO_YOUR_TRAINED_WEIGHTS \
    --dir_image PATH_TO_YOUR_IMAGE \
    --apply_ocr True

The above scripts require Tesseract to be installed on your device:

sudo apt install tesseract-ocr
pip install pytesseract

It is noticeable that if the distribution of the OCR results in inference differs from those in your fine-tuning stage, the kv-pair extraction performance will be greatly affected.

The detailed documentation will be updated later this month.

Muhammad-Hamza-Jadoon commented 1 month ago

Thanks alot.

isXuedingesCat commented 1 month ago

The inference code is now available at deploy/inference.py. You may run the following commands to make an inference:

python deploy/inference.py \
    --model_name_or_path PATH_TO_YOUR_TRAINED_WEIGHTS \
    --dir_image PATH_TO_YOUR_IMAGE \
    --dir_ocr PATH_TO_YOUR_OCR_RESULT_OF_THE_IMAGE

If you do not have the OCR results, you may use the built-in Tesseract OCR pipeline of transformers:

python deploy/inference.py \
    --model_name_or_path PATH_TO_YOUR_TRAINED_WEIGHTS \
    --dir_image PATH_TO_YOUR_IMAGE \
    --apply_ocr True

The above scripts require Tesseract to be installed on your device:

sudo apt install tesseract-ocr
pip install pytesseract

It is noticeable that if the distribution of the OCR results in inference differs from those in your fine-tuning stage, the kv-pair extraction performance will be greatly affected.

The detailed documentation will be updated later this month.

请问大佬, OCR 推理的结果应该保存成什么样的格式?

ZeningLin commented 1 month ago

@isXuedingesCat 一个列表,里面的元素是一个字典,字典包含"text"和"bbox"两个字段。下面是一个示例:

[
    {
        "text": "text_content", 
        "bbox": [
            left_val, top_val, right_val, bottom_val
        ]
    }
]

也可以根据个人需求,把inference.py L201-L208 这里的代码修改适配一下,最终只要拿到line_text_list和line_box_list就好~

isXuedingesCat commented 1 month ago

@isXuedingesCat 一个列表,里面的元素是一个字典,字典包含"text"和"bbox"两个字段。下面是一个示例:

[
    {
        "text": "text_content", 
        "bbox": [
            left_val, top_val, right_val, bottom_val
        ]
    }
]

也可以根据个人需求,把inference.py L201-L208 这里的代码修改适配一下,最终只要拿到line_text_list和line_box_list就好~

谢谢大佬解答,再请教一下,文本框如果倾斜的,我直接使用左上右下两点的坐标这样是否可行?

ZeningLin commented 1 month ago

@isXuedingesCat 一个列表,里面的元素是一个字典,字典包含"text"和"bbox"两个字段。下面是一个示例:

[
    {
        "text": "text_content", 
        "bbox": [
            left_val, top_val, right_val, bottom_val
        ]
    }
]

也可以根据个人需求,把inference.py L201-L208 这里的代码修改适配一下,最终只要拿到line_text_list和line_box_list就好~

谢谢大佬解答,再请教一下,文本框如果倾斜的,我直接使用左上右下两点的坐标这样是否可行?

没问题的,取最小外包矩就好~ LayoutLM样式的backbone都只能处理两点框,我这边工程测试出来效果暂未见有什么影响

isXuedingesCat commented 1 month ago

@isXuedingesCat 一个列表,里面的元素是一个字典,字典包含"text"和"bbox"两个字段。下面是一个示例:

[
    {
        "text": "text_content", 
        "bbox": [
            left_val, top_val, right_val, bottom_val
        ]
    }
]

也可以根据个人需求,把inference.py L201-L208 这里的代码修改适配一下,最终只要拿到line_text_list和line_box_list就好~

谢谢大佬解答,再请教一下,文本框如果倾斜的,我直接使用左上右下两点的坐标这样是否可行?

没问题的,取最小外包矩就好~ LayoutLM样式的backbone都只能处理两点框,我这边工程测试出来效果暂未见有什么影响

谢谢

ZeningLin commented 3 weeks ago

@Muhammad-Hamza-Jadoon The detailed documentation is now available at docs/documentation.md