We have OCR training data of line images and text pairs from Norbuketaka that needs to be filtered.
Key Concepts
Pillow: The Python Imaging Library adds image processing capabilities to your Python interpreter.This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities.
Context
Norbuketaka project team has already proofread about 500 works and then we have all those works google OCR output and images. We got line images and its corresponding text from Sina, where he used the above mentioned proofread text, work images and work google OCR output. But we still need to filter out all the images that has length shorter than its heights or text that has only one or two characters or text with numbers in it.
RFW0145: Filter Norbuketaka OCR training data.
Summary
We have OCR training data of line images and text pairs from Norbuketaka that needs to be filtered.
Key Concepts
Pillow: The Python Imaging Library adds image processing capabilities to your Python interpreter.This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities.
Context
Norbuketaka project team has already proofread about 500 works and then we have all those works google OCR output and images. We got line images and its corresponding text from Sina, where he used the above mentioned proofread text, work images and work google OCR output. But we still need to filter out all the images that has length shorter than its heights or text that has only one or two characters or text with numbers in it.
Outputs
Filtered out usable OCR training data
Inputs
line images and json file google drive
Timeline
Specify the expected delivery date for the project.
References
Include any relevant links or resources for additional context or information.