KaniyamFoundation / ProjectIdeas

A Place to write down the project ideas and to plan them
40 stars 3 forks source link

Scrap encyclopedia from tamilvu.org #90

Open tshrinivasan opened 5 years ago

tshrinivasan commented 5 years ago

http://www.tamilvu.org/ta/library-kulandaikal-lku00-html-lku00ind-233496

Here are two tamil encyclopedia available in Tamil.

Scrap the images and then do ocr for them.

as first task, scrap all the images and share in public as images and pdf.

NaveenPrasanth commented 5 years ago

Hi, I can help with this. Please give me more details. I can work on this tomorrow

tshrinivasan commented 5 years ago

Thanks.

explore the two encyclopedia here http://www.tamilvu.org/ta/library-kulandaikal-lku00-html-lku00ind-233496

navigate each pages and find the url pattern, start page and end page. then scrap them or download using uget downloader.

tshrinivasan commented 5 years ago

Linuxpert basker is scraping these books now. Sent samples. they are fine. will share in public, once scraping is completed.

@NaveenPrasanth Please check other issues to contribute. Thanks.

NaveenPrasanth commented 5 years ago

Okay, thanks for letting me know.

tshrinivasan commented 5 years ago

The scraping is completed.

Thanks to Baskar selvaraj sir of linuxpert.in and digimat.in