Open ta4tsering opened 3 months ago
Google books images being uploaded and csv for the Google books is created, for the script type and the print_method I have used the work_id to get the bdrc's ttl and parsed it.
Google Books datasets : https://huggingface.co/datasets/ta4tsering/Google_Books_datasets Norbuketaka datasets : https://huggingface.co/datasets/ta4tsering/Norbuketaka_datasets
Description: For the Modern printed data we have Norbuketaka data and Google Books data. So we need to add all these data to the s3 bucket where Woodblock data has been uploaded.
Completion Criteria: Upload both the Google books data and Norbuketaka data.
Subtask:
tif
images tojpg
OCR/Training_Images
on s3