OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

RFW0138: Training T5 like model for Post Correction of OCR output. #385

Open kaldan007 opened 5 months ago

kaldan007 commented 5 months ago

RFW0138: Training T5 like model to Post Correction of OCR output.

Summary

We need to train T5 like transformer base model to post correct OCR output.

Key Concepts

OCR: Optical Character Recognition T5: T5 is a transformer base model.

Context

After discussing with Sebastain we came to know that with T5 model we can train different type of sequence to sequence models. Among them there is possibility of train a model where our input data will be incorrect ocr output data and the labelled data will be the corrected text. Training such model will be usefull to post correct our OCR output. With that we can make the inference review will be faster as they will get almost perfect data. In future we can use this model for correcting other OCR model output.

Outputs

T5 model for post correcting OCR output

Inputs

OCR output data and manually corrected output

Timeline

Specify the expected delivery date for the project.

References

Include any relevant links or resources for additional context or information.