KaniyamFoundation / ProjectIdeas

A Place to write down the project ideas and to plan them
40 stars 3 forks source link

Scrap the site and get books list #124

Open tshrinivasan opened 3 years ago

tshrinivasan commented 3 years ago

http://www.noolulagam.com/books/

This site has 3877 pages, 10 books each page.. i.e 38770 books info http://www.noolulagam.com/books/1/ http://www.noolulagam.com/books/3877/

Scrap each page and get the below details as a CSV file

நூல் பெயர், வகை, எழுத்தாளர், பதிப்பகம், விலை, ஆண்டு

manimaran990 commented 3 years ago

Hi,

kind of dirty code, please get the codebase

https://github.com/manimaran990/bookscrap

On Wed, Oct 28, 2020 at 6:32 AM Shrinivasan T notifications@github.com wrote:

http://www.noolulagam.com/books/

This site has 3877 pages, 10 books each page.. i.e 38770 books info http://www.noolulagam.com/books/1/ http://www.noolulagam.com/books/3877/

Scrap each page and get the below details as a CSV file

நூல் பெயர், வகை, எழுத்தாளர், பதிப்பகம், விலை, ஆண்டு

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/KaniyamFoundation/ProjectIdeas/issues/124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX3C5DQBSRROUXG27SIULSM7XNPANCNFSM4TCED5FA .

-- Regards, Manimaran G

manimaran990 commented 3 years ago

updated the code to fetch the title from h4 tag

tshrinivasan commented 3 years ago

https://www.dropbox.com/s/2ac1taj445yihrw/complete.csv?dl=0

Here is the result csv file.

It needs more cleaning.

tshrinivasan commented 3 years ago

Thanks @manimaran990

stephenraj314 commented 3 years ago

Hi sir, I'm new to python. notify me issues in code. waiting to resolve. https://github.com/stephenraj314/Bs4scraping

muthu1809 commented 3 years ago

Stephen is enthusiastic and passionate buddy python developer. It is first try. Update him changes needed in his code. He will solve the issues.

tshrinivasan commented 3 years ago

Thanks @muthu1809 and @stephenraj314

I reported an issue at the repo's issue section.

stephenraj314 commented 3 years ago

Hi sir, scrapped book details using selenium please check codebase https://github.com/stephenraj314/SeleniumwebScraping.git