RFW0133: Creating a benchmark dataset for OCR

Summary

Need of benchmark data set with even distribution to evaluate our current OCR model.

Key Concepts

Benchmark dataset: The dataset is used as a reference point for performance evaluation. OCR: Optical Character Recognition.

Context

Need of a benchmark data set to evaluate the OCR model with even distribution including printing modes, source of image or pdf(organization name). Creating such data set would give us a clear understanding of where the model underperforms and counter measures could be taken to improve the model.

Outputs

A line to text image with data with even distribution.

Inputs

OpenPecha-Data OpenPecha-Data catalog

Timeline

Specify the expected delivery date for the project.

References

Include any relevant links or resources for additional context or information.

OpenPecha / Requests