KaniyamFoundation / ProjectIdeas

A Place to write down the project ideas and to plan them
37 stars 3 forks source link

Make accessible copies (pdf) of publically available EAP works, sync them to Internet Archive #199

Open Natkeeran opened 3 months ago

Natkeeran commented 3 months ago

There are 25000 - 50000 Tamil items here. However, the works are not available as pdf. Content of these material are in public domain.

Need to convert the content into pdf, collect the metadata and upload it to Internet Archive and possibly WikiSource as well.

tshrinivasan commented 3 months ago

https://github.com/siddharthisaiah/eap-books-rescue

https://github.com/prachatos/eap2pdf

Let us explore this code

Message ID: @.***>

tshrinivasan commented 2 months ago

https://dezoomify-rs.ophir.dev/ is downloading the full images from EAP or IIIF server.

it download splitted images and stitch them as full image.

tshrinivasan commented 2 months ago

started to download from EAP and upload to archive.

first upload is here. https://archive.org/details/EAP372-8-10-152/

Natkeeran commented 2 months ago

Script to download EAP works (files) of a particular language: https://github.com/KaniyamFoundation/tamilbooks_metadata/tree/main/script/eap