bhnascar / Viral-Art

CS231A final project
0 stars 0 forks source link

Write scraper for DA #5

Closed mindyh closed 8 years ago

mindyh commented 8 years ago
  1. Get URL for images
  2. Given URL for image, get the
    • image
    • views
    • favs
    • category
    • number of watchers the artist has
bhnascar commented 8 years ago

Currently looking into https://www.crummy.com/software/BeautifulSoup/bs4/doc/

bhnascar commented 8 years ago

Is this an appropriate RSS link?

http://backend.deviantart.com/rss.xml?q=boost%3Apopular+in%3Adigitalart%2Fpaintings%2Fpeople%2Fportraits+max_age%3A24h&type=deviation

bhnascar commented 8 years ago

Nope, nvm, doesn't have fav or views data.

bhnascar commented 8 years ago

Okay we now have a scraper. Yay!

You can use it to scrape a search results page like so:

./scraper.py -r http://www.deviantart.com/browse/all/digitalart/paintings/people/portraits/

This will add every single image in the search results to our database, including favorites, view count, medium, and url to the actual image.

You can also scrape a single image page and add only data for that image to our database, like so:

./scraper.py -s http://www.deviantart.com/art/Reds-606604260

Finally you can update the scraped data for all images currently in the database like so:

./scraper.py -u

Note that this doesn't automatically run the feature extractor over the images. You'd still have to call ./extractor.py to actually run all the image analysis stuff (which will go into the database, find the newly added rows, actually download the images, run the feature extractors, and fill out the rest of the columns).

mindyh commented 8 years ago

Wow, so fast. How long does it take to scrape an image? And just making sure, you put a delay in right, I don't want to get banned >.<

On Sun, May 22, 2016, 1:00 AM Ben-han Sung notifications@github.com wrote:

Okay we now have a scraper. Yay!

You can use it to scrape a search results page like so:

This will add every single image in the search results to our database, including favorites, view count, medium, and url to the actual image.

./scraper.py -r http://www.deviantart.com/browse/all/digitalart/paintings/people/portraits/

You can also scrape a single image page and add only data for that image to our database, like so:

./scraper.py -s http://www.deviantart.com/art/Reds-606604260

Note that this doesn't automatically run the feature extractor over the images. You'd still have to call ./extractor.py to actually run all the image analysis stuff.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/bhnascar/Viral-Art/issues/5#issuecomment-220819610