OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

RFW150: Create OCR training data by doing annotation transfer. #445

Open ta4tsering opened 4 months ago

ta4tsering commented 4 months ago

RFW150: Create OCR training data by doing annotation transfer.

Summary

We need to create OCR training data by transfer line annotations from google OCR output to cleaned no annotation etext and then map the line to line images.

Key Concepts

List and define any specific terms or concepts related to this RFW.

Context

We have a in-house package called antx, a team of annotators are searching and cataloging the Tibetan text which has cleaned or proofread etext and then looking for versions of that text in images with different fonts. And then we are gonna OCR those images using google OCR, we will take the OCR output and create the etext with line annotations from the images and then use antx to transfer the line annotations to the cleaned text and then map the line images to the line text.

Outputs

page image and its corresponding etext.

Inputs

OCR outputs and cleaned etext

Timeline

Specify the expected delivery date for the project.

References

Include any relevant links or resources for additional context or information.