OpenPecha / Requests

RFWs and RFCs for all OpenPecha repositories
0 stars 0 forks source link

RFC133: Creating a benchmark dataset for OCR #465

Open ta4tsering opened 4 months ago

ta4tsering commented 4 months ago

RFC133: Creating a benchmark dataset for OCR

Named Concepts

Summary

create benchmark dataset for OCR from all the transcribed line images, use them to filter out line images randomly from each batch or works.

Dependencies

Include all the dependencies you are going to use while implementing.

Infrastructures

Include all the infrastructure required for running the task, such as S3 bucket, EC2 server, etc.

Design Illustrations

Screenshot 2024-03-01 at 11 58 30 AM

Justification

This is the best method for now as it will be much quicker to take the benchmark data from the transcribed data then to take new images to be transcribed from scratch.

Testing

Describe the kind of testing procedures that are needed as part of fulfilling this request.

Implementation Steps

Reviewed By

@kaldan007

kaldan007 commented 4 months ago

@ta4tsering we need to create separate benchmark for each writing style we can mix ume in uchen benchmark as it won't be fair. Other than that it looks good to me.

ta4tsering commented 3 months ago

Okay, will make it for Uchan only.