charlotte-ruby / image_scraper

simple utility that pulls image URLs from a web page
MIT License
21 stars 10 forks source link

ImageScraper

ruby

Simple utility that pulls image URLs from web page

Installation

Install in your application's Gemfile or as a standalone gem:

gem 'image_scraper'

And then execute:

$ bundle install

Standalone install:

$ gem install image_scraper

Usage

options = {
    convert_to_absolute_url: true,
    include_css_images: true # convert any relative images to absolute urls.
    include_css_data_images: true # convert any data images (data:image/gif;base64....)
}

image_scraper = ImageScraper::Client.new("http://www.rubygems.org", options)
image_scraper.image_urls

# => ["https://rubygems.org/assets/github_icon.png"", "https://rubygems.org/sponsors.png"]

CLI

$ image_scraper https://unsplash.com | head -n 2
https://images.unsplash.com/photo-1471897488648
https://images.unsplash.com/photo-1590073242678

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb. Releases are done via Github Actions.

If you prefer to use docker:

docker-compose build
docker-compose run app

Once inside the container, run the tests and you'll see output similar to this:

/usr/src/app # bundle exec rspec
........................

Finished in 0.54303 seconds (files took 0.95976 seconds to load)
24 examples, 0 failures

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/charlotte-ruby/image_scraper. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

Copyright

Copyright (c) 2011 John McAliley. See LICENSE.txt for further details.

Code of Conduct

Everyone interacting in the ImageScraper project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.