Open 10kalden opened 4 months ago
mapping of the cropped lines done, now will use the toolkit to download and run the script to crop the missing glyphs presented lines.
which OPF, provide the name of the OPF here and then. There could be more than 1 opf for a single work id so that means you can look for opf of the same work with image_group id in the meta.yml
https://github.com/OpenPecha-Data/P000800 this is the opf which doest have the mapping in the meta.yml
you have to look into the files to explore it, for example if you look into the pagination.yml you will find the image_group_id in the reference. you need to compare it with the image_group_id from the bdrc websites as well. bdrc v001 and in here you can see that the image_group_id is I1317
. In the reference of first page 13170001
so that means our image_group_id is I1317
@ta4tsering OK yeah, that's correct.. thank you for that
will download the images and crop the lines, upload the cropped lines to the s3 and create the jsonl required for the prodigy format. on the EC2 server.
Will be working on extracting more data from Google OCR output. @10kalden please check release asset of google OCRed OPFs
Description:
Reference:
img.1: Line cropped image for the character ཡུ
Subtask:
Completion Criteria To have cropped images ready for annotation