everypolitician / scraped

Write declarative scrapers in Ruby
MIT License
8 stars 0 forks source link

Add a shortcut for scraping a url with a class #50

Open chrismytton opened 7 years ago

chrismytton commented 7 years ago

Problem

Currently, the following boilerplate is needed to use a Scraped::Document subclass:

AllMembersPage.new(response: Scraped::Request.new(url: url).response)

This is a bit long-winded and we've seen problems with people accidentally forgetting to call Scraped::Request#response, for example.

Proposed solution

Add a shortcut method to make this less error-prone and less verbose. There's an example of a top-level scrape method in the russia-duma-2016 scraper. It would be good to make this available as a method on the Scraped module.

url = 'http://www.duma.gov.ru/structure/deputies/?letter=%D0%92%D1%81%D0%B5'
page = Scraped.scrape(url => AllMembersPage)
davewhiteland commented 7 years ago

"we've seen problems with people"

People called @davewhiteland :-|

chrismytton commented 7 years ago

Heh, I've definitely caught @chrismytton doing it as well 😉

chrismytton commented 7 years ago

Note: This might well end up becoming part of the Everypolitician::Scraper class (which only exists in prototype form currently) once https://github.com/everypolitician/everypolitician/issues/572#issuecomment-279790931 is done.