SSOC18 / ssoc18-wiki

Shadi's Summer of Code 2018
0 stars 1 forks source link

[MohammadAliHamdan] scrapy tutorial (~5h) #18

Closed shadiakiki1986 closed 5 years ago

shadiakiki1986 commented 6 years ago

@MohammadAliHamdan

Please follow the scrapy tutorial here and when done let's talk so you show me what you've done

ma-hamdan commented 6 years ago

I finished the tutorial, didn't find real problems. I am ready to use Scrapy for the next task,

shadiakiki1986 commented 6 years ago

Where is the code for your spider? I can't find it on the ssoc18.teamshadi.net server

ma-hamdan commented 6 years ago

I just added the code. It is similar to the one in the tutorial.

shadiakiki1986 commented 6 years ago

can you please run this spider on the ssoc18 server and use the -o quotes.json option so that I see what your scraped items look like?

shadiakiki1986 commented 6 years ago

Any updates?

ma-hamdan commented 6 years ago

The full scrapy project with the json file is on the server now.

shadiakiki1986 commented 6 years ago

Can you add a README file with instructions on how to run your spider?

shadiakiki1986 commented 6 years ago

Also, the server has installed

Please add the mongo pipeline to your spider to store the items in a mongo database, log into adminMongo, and paste a screenshot of your items showing up there.

ma-hamdan commented 6 years ago

image

shadiakiki1986 commented 6 years ago

LGTM. Let's take it a step further and plot a histogram of the number of books per author. Please create a jupyter notebook that connects to this mongo database using pymongo and use matplotlib to plot the histogram.