PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.98k stars 7.31k forks source link

Related to semantic entity relation #11995

Closed Dineshkumar-Anandan-ZS0367 closed 1 week ago

Dineshkumar-Anandan-ZS0367 commented 3 weeks ago

The semantic entity relation model works fine, some key value pair in documents are predicted as only answer, how to fix this issue. How to properly identified questions and answers for healthcare documents.

  1. Is there any options for SER tokenizer.
  2. Any options to finetune that code.
  3. Is there any preprocess work need for this predictions.
UserWangZz commented 3 weeks ago

Can you give a more detailed example? Based solely on what you mentioned in your question, there is a scenario where a key points to multiple values in your data, right? If so, you need to check if your GT is correctly associated with the KV relationship, and briefly calculate the proportion of this scenario in the entire dataset, and try to increase it as much as possible.

Dineshkumar-Anandan-ZS0367 commented 3 weeks ago

1 4

Please look into this document, for ex, patient name is key and pamela wood is a answer

UserWangZz commented 3 weeks ago

Did you use the official model for inference? Have you used the data from the current document for fine-tuning the model?

Dineshkumar-Anandan-ZS0367 commented 2 weeks ago

Yes i am using this official paddleocr model for english.

Now that is a default model, i can't finetune the model.

Can you please share some ideas or anything about this problem

UserWangZz commented 2 weeks ago

You can refer to the following document to fine tune the official model to fit your data, including data preparation, starting training, and so on. Chinese document address: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/kie.md English document address: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/kie_en.md