OpenPecha / extract-missing-glyphs

MIT License
0 stars 0 forks source link

OCR0034: Further processing for cropped images #5

Open 10kalden opened 4 months ago

10kalden commented 4 months ago

Description:

Reference:

img.1: Line cropped image for the character ཡུ Image

Subtask:

Completion Criteria To have cropped images ready for annotation

ta4tsering commented 4 months ago

mapping of the cropped lines done, now will use the toolkit to download and run the script to crop the missing glyphs presented lines.

10kalden commented 4 months ago
ta4tsering commented 4 months ago

which OPF, provide the name of the OPF here and then. There could be more than 1 opf for a single work id so that means you can look for opf of the same work with image_group id in the meta.yml

10kalden commented 4 months ago

https://github.com/OpenPecha-Data/P000800 this is the opf which doest have the mapping in the meta.yml

ta4tsering commented 4 months ago

you have to look into the files to explore it, for example if you look into the pagination.yml you will find the image_group_id in the reference. you need to compare it with the image_group_id from the bdrc websites as well. bdrc v001 and in here you can see that the image_group_id is I1317. In the reference of first page 13170001 so that means our image_group_id is I1317

10kalden commented 4 months ago

@ta4tsering OK yeah, that's correct.. thank you for that

ta4tsering commented 4 months ago

will download the images and crop the lines, upload the cropped lines to the s3 and create the jsonl required for the prodigy format. on the EC2 server.

kaldan007 commented 4 months ago

Will be working on extracting more data from Google OCR output. @10kalden please check release asset of google OCRed OPFs