OpenPecha / Toolkit

🛠 Tools to create, edit and export texts and annotations
https://toolkit.openpecha.org
Apache License 2.0
7 stars 4 forks source link

Google Books OCR batch 2 #265

Open eroux opened 6 months ago

eroux commented 6 months ago

Here's the list of 3719 pechas to import from Google Books (it's all already on S3):

ocr_archive.csv

kaldan007 commented 4 months ago

I have imported the google books in batches. these are the three batches that has been complelted till date. I am having some issues with remaining works. I will examine the bug and keep u updated.

batch_01_opfs.csv batch_02.csv batch_03_opf_catalog.csv

eroux commented 4 months ago

thanks a lot!

kaldan007 commented 4 months ago

2024-03-07 11:48:38,191 - ERROR - Error downloading W10206--Batch 2022 missing 2024-03-07 11:48:41,843 - ERROR - Error downloading W19792--Batch 2022 missing 2024-03-07 11:48:48,362 - ERROR - Error downloading W1KG25527--Batch 2022 missing 2024-03-07 11:49:04,086 - ERROR - Error downloading W1PD159442--Batch 2022 missing 2024-03-07 11:49:06,488 - ERROR - Error downloading W1PD159533--Batch 2022 missing 2024-03-07 11:49:26,673 - ERROR - Error downloading W3CN7719--Batch 2022 missing 2024-03-07 11:49:28,995 - ERROR - Error downloading W8LS18002--Batch 2022 missing

kaldan007 commented 4 months ago

@eroux these work id doesn't have batch_2022

kaldan007 commented 4 months ago

here is the last batch.

batch_04.csv