Open tshrinivasan opened 5 years ago
Hi, I can help with this. Please give me more details. I can work on this tomorrow
Thanks.
explore the two encyclopedia here http://www.tamilvu.org/ta/library-kulandaikal-lku00-html-lku00ind-233496
navigate each pages and find the url pattern, start page and end page. then scrap them or download using uget downloader.
Linuxpert basker is scraping these books now. Sent samples. they are fine. will share in public, once scraping is completed.
@NaveenPrasanth Please check other issues to contribute. Thanks.
Okay, thanks for letting me know.
The scraping is completed.
Thanks to Baskar selvaraj sir of linuxpert.in and digimat.in
http://www.tamilvu.org/ta/library-kulandaikal-lku00-html-lku00ind-233496
Here are two tamil encyclopedia available in Tamil.
Scrap the images and then do ocr for them.
as first task, scrap all the images and share in public as images and pdf.