Web scraper with an elegant DSL that parses structured data from web pages.
gem install wombat
The simplest way to use Wombat is by calling Wombat.crawl
and passing it a block:
require 'wombat'
Wombat.crawl do
base_url "https://www.github.com"
path "/"
headline xpath: "//h1"
subheading css: "p.alt-lead"
what_is({ css: ".one-fourth h4" }, :list)
links do
explore xpath: '/html/body/header/div/div/nav[1]/a[4]' do |e|
e.gsub(/Explore/, "Love")
end
features css: '.nav-item-opensource'
business css: '.nav-item-business'
end
end
{
"headline"=>"How people build software",
"subheading"=>"Millions of developers use GitHub to build personal projects, support their businesses, and work together on open source technologies.",
"what_is"=>[
"For everything you build",
"A better way to work",
"Millions of projects",
"One platform, from start to finish"
],
"links"=>{
"explore"=>"Love",
"features"=>"Open source",
"business"=>"Business"
}
}
Copyright (c) 2019 Felipe Lima. See LICENSE.txt for further details.