RFW0132: [Cleaning and validating Pecha's data before storing in OpenPecha-Data]
Summary
Need of cleaning and validating the pecha(opf) data before storing to OpenPecha-Data.
Key Concepts
opf: file format that is currently being used in storing annotation in OpenPecha-Data.
see example
Context
After the google OCR model output and annotator correction, the annotation file is in .json format. What we are currently doing is directly creating a repository name and uploading those annotation files.
Now those uploaded annotation files surely still has some error, so cleaning and validating them before storing them would later help improve the performance of our OCR model. The errors could be missing feature, value error, and more.
Outputs
cleaned annotation files in json format produced by the script.
Inputs
annotation files in json format.
Timeline
Specify the expected delivery date for the project.
RFW0132: [Cleaning and validating Pecha's data before storing in OpenPecha-Data]
Summary
Need of cleaning and validating the pecha(opf) data before storing to OpenPecha-Data.
Key Concepts
opf: file format that is currently being used in storing annotation in OpenPecha-Data. see example
Context
After the google OCR model output and annotator correction, the annotation file is in .json format. What we are currently doing is directly creating a repository name and uploading those annotation files.
Now those uploaded annotation files surely still has some error, so cleaning and validating them before storing them would later help improve the performance of our OCR model. The errors could be missing feature, value error, and more.
Outputs
cleaned annotation files in json format produced by the script.
Inputs
annotation files in json format.
Timeline
Specify the expected delivery date for the project.
References
OpenPecha-Data