ericbeland / Scrapeybara

A web scraping tool based on Capybara. This project is deprecated.
9 stars 1 forks source link

Scrapybara

A Capybara-based web scraping tool. https://github.com/jnicklas/capybara

Capybara is a wonderful Ruby project created by Jonas Nicklas that offers a single DSL for automating interactions with web applications for integration tests. By providing a single DSL for a variety of web drivers, Capybara allows for all sorts of awesomeness. Although it humbly thinks it is just a humble integration testing framework, Capybara really provides a lingua franca that allows for driver independent web tools. Capybara lets a single scripting DSL drive a variety of drivers, including real browsers (firefox, ie, chrome) via selenium/webdriver, direct http-level interaction via mechanize/rack, and simulated headless browsers (with javascript) via Akephalos and HTTP Unit, which makes Capybara make a flexible platform for building all sorts of web tools.

But enough about Capybara... About me: I provide a wrapper DSL for scraping web pages via Capybara scripts, a system for extracting related data.

Scrapybara provides:

- Page content extraction DSL
- Pluggable Parameterization system (usernames, passwords)
- Pluggable Data Outputters  
- Error Recovery DSL for capybara navigations

https://gist.github.com/569530

If you want to use the transaction/step capabilities within a rails project, run ./script/generate scrapeybara

To Do:

- Pluggable Response Info Outputters (for easy debugging)
- Pacing Options