OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

RFW0132: Cleaning and validating Pecha's data before storing in OpenPecha-Data #380

Open tenzin3 opened 8 months ago

tenzin3 commented 8 months ago

RFW0132: [Cleaning and validating Pecha's data before storing in OpenPecha-Data]

Summary

Need of cleaning and validating the pecha(opf) data before storing to OpenPecha-Data.

Key Concepts

opf: file format that is currently being used in storing annotation in OpenPecha-Data. see example

Context

After the google OCR model output and annotator correction, the annotation file is in .json format. What we are currently doing is directly creating a repository name and uploading those annotation files.

Now those uploaded annotation files surely still has some error, so cleaning and validating them before storing them would later help improve the performance of our OCR model. The errors could be missing feature, value error, and more.

Outputs

cleaned annotation files in json format produced by the script.

Inputs

annotation files in json format.

Timeline

Specify the expected delivery date for the project.

References

OpenPecha-Data