create benchmark dataset for OCR from all the transcribed line images, use them to filter out line images randomly from each batch or works.
Dependencies
Include all the dependencies you are going to use while implementing.
Infrastructures
Include all the infrastructure required for running the task, such as S3 bucket, EC2 server, etc.
Design Illustrations
Justification
This is the best method for now as it will be much quicker to take the benchmark data from the transcribed data then to take new images to be transcribed from scratch.
Testing
Describe the kind of testing procedures that are needed as part of fulfilling this request.
Implementation Steps
[x] OpenPecha/Create_OCR_benchmark_data#1
Estimated time: 1 hr.
Actual time: 1 hr
[x] OpenPecha/Create_OCR_benchmark_data#2
Estimated time: 4 hr
Actual time: 4hr
[ ] OpenPecha/Create_OCR_benchmark_data#3
Estimated time: 2 hr
Actual time:
[ ] OpenPecha/Create_OCR_benchmark_data#4
Estimated time: 6 hr
Actual time:
[ ] OpenPecha/Create_OCR_benchmark_data#5
Estimated time: 6 hr
Actual time:
[ ] OpenPecha/Create_OCR_benchmark_data#6
Estimated time: 8 hr
Actual time:
@ta4tsering we need to create separate benchmark for each writing style
we can mix ume in uchen benchmark as it won't be fair. Other than that it looks good to me.
RFC133: Creating a benchmark dataset for OCR
Named Concepts
Summary
create benchmark dataset for OCR from all the transcribed line images, use them to filter out line images randomly from each batch or works.
Dependencies
Include all the dependencies you are going to use while implementing.
Infrastructures
Include all the infrastructure required for running the task, such as S3 bucket, EC2 server, etc.
Design Illustrations
Justification
This is the best method for now as it will be much quicker to take the benchmark data from the transcribed data then to take new images to be transcribed from scratch.
Testing
Describe the kind of testing procedures that are needed as part of fulfilling this request.
Implementation Steps
Reviewed By
@kaldan007