PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.78k stars 7.69k forks source link

SER and RE Training - Basic Understanding #9382

Closed NaumanHSA closed 1 year ago

NaumanHSA commented 1 year ago

Hi,

I'm new to PaddleOCR and want to train RE model on my custom dataset. I've annotated around 50 images using Label Studio and parsed them according to the PaddleOCR documentation. I set the ML backend in Label Studio to PPOCR engine for text detection and recognition.

In my custom dataset, the question-answer pairs are very close to each other e.g. Name: ABC for which the PaddleOCR engine creates only one box. I had to adjust and create another box to make separate boxes for questions and answers. Also, some text wouldn't recognize correctly (mostly spaces wouldn't be detected). My questions are:

  1. While training SER/RE model, does it takes bounding boxes and text from the ground truth? If so, then how it performs the evaluation? May be the SER/RE model only evaluates the classification while taking OCR information from the GT.
  2. While doing SER/RE inference, it uses PaddleOCR engine for text detection and recognition. Does it mean that I'll have to refine the text detection model in order to generate separate bounding boxes for questions and answers?
  3. If someone has already trained SER and RE models, can you please guide me; approximately, how many images are required for the training?
  4. Finally, I'd ask you guys to give your opinion on how you would proceed with such a problem?

Thank you all in advance.

an1018 commented 1 year ago

1、suggest you read the original paper or some blogs 2、yes, if the effect of detection doesn't work well , you can train/finetune the model on your own dataset 3、you can refer to the num of xfund_zh dataset, and it's hmean:https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.6/doc/doc_en/algorithm_kie_layoutxlm_en.md

NaumanHSA commented 1 year ago

Thank you @an1018. Yes, I'm doing some research on these models to understand them fully.