Closed SamuelMTDavies closed 5 years ago
Weird. I will take a look at it later today and reply with the findings. Can you please provide the exact command you are running, including the website you are trying to snap if possible?
I have tried all of the following with and without sudo (I tried some others but I think I got the syntax wrong or they weren't as comprehensive):
sudo snapcrawl go https://www.uown.co -d4 -f/Users/sam/snaps sudo snapcrawl go https://www.uown.co -d4 -fsnaps sudo snapcrawl go https://www.uown.co -d3
Ok, so a couple of thoughts so far:
phantomjs
- if you are on linux, you don't have to do anything, since it installs it automatically, but if you are on windows, it could be that this automatic installation did not work.https
URL) on my machine, fails with an errorhttps
screenshots directly with screencap, and I get the same errorYour thoughts?
Hi Dan,
Just jumping to it.
If that is the case I understand and will use some of the other tools available online. I appreciate your time and effort replying.
There is a chance I can get something working by tomorrow early morning / noon. Maybe (not promising) even today.
I plan on fixing it either way - right now it just does not work.
Off-topic Edit: Not sure how it is on a mac, but in most circumstances I see, running a gem should not require sudo
You may be correct about the sudo - I just did so in case there was some finickety write permissions issue (I'm not very clued up on Ruby or coding beyond intermediate level stuff for that matter) so I thought it worth a try.
If you manage something by Friday I would be forever grateful. I will keep an eye out on this thread.
Stay tuned - I hope to have something ready today. I already integrated the new gem dependency, captured https successfully - next I test your site specifically, and do some polish (since some features will be removed), and release a gem for you to try.
Ok - if you want to test it, follow these steps:
cd
to itGemfile
bundle install
# Gemfile
source "https://rubygems.org"
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
gem "snapcrawl", github: 'DannyBen/snapcrawl', branch: 'webshot'
I tested it with your site, it captures, but I am not sure it looks good - you need to remember that these "headless browsers" that are used for these captures, are equivalent to old browsers.
Also, ignore the weird output that it might print, it is the webshot gem doing it, I will sort this out later.
Lastly, make sure you have the right versions of everything:
phantomjs --version
- should be 2.x
snapcrawl --version
- should be 0.2.4rc1
For starters, just run this command to capture the homepage:
snapcrawl go uown.co
Ok so I followed your instructions and got the following console print out. It created a snaps folder but still nothing in it.
-----> Visit: http://uown.co Snap! Snapping picture... done Crawl! Extracting links... /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:225:in
open_loop': redirection forbidden: http://uown.co -> https://www.uown.co/ (RuntimeError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:151:in open_uri' from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:717:in
open'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:35:in open' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:120:in
extract_urls_from!'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:112:in extract_urls_from' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:74:in
block in crawl_and_snap'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:65:in each' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:65:in
crawl_and_snap'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:59:in block in crawl' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:58:in
times'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:58:in crawl' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:34:in
execute'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/lib/snapcrawl/crawler.rb:26:in handle' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.3/bin/snapcrawl:6:in
<top (required)>'
from /usr/local/bin/snapcrawl:22:in load' from /usr/local/bin/snapcrawl:22:in
I then checked my snapcrawl version (I assumed the bundle install was updating it) but I'm still on 0.2.3 - What command do I need to run to update it? - gem update hasn't made any changes to my version.
Hmm...
Alright - forget the Gemfile
solution (delete this file).
I have just published the gem to RubyGems, so you can simply run this:
gem install snapcrawl --version 0.2.4rc1
And then check snapcrawl version before proceeding.
Still no images but I'm on the right snapcrawl version.
Console suggests that I'm missing a dependency? Cliver or phantomjs? It's a little cryptic
Snapping picture... /Library/Ruby/Gems/2.3.0/gems/cliver-0.3.2/lib/cliver/dependency.rb:143:in
raise_not_found!': Could not find an executable ["phantomjs"] on your path. (Cliver::Dependency::NotFound)
from /Library/Ruby/Gems/2.3.0/gems/cliver-0.3.2/lib/cliver/dependency.rb:116:in detect!' from /Library/Ruby/Gems/2.3.0/gems/cliver-0.3.2/lib/cliver.rb:24:in
detect!'
from /Library/Ruby/Gems/2.3.0/gems/poltergeist-1.12.0/lib/capybara/poltergeist/client.rb:48:in initialize' from /Library/Ruby/Gems/2.3.0/gems/poltergeist-1.12.0/lib/capybara/poltergeist/client.rb:14:in
new'
from /Library/Ruby/Gems/2.3.0/gems/poltergeist-1.12.0/lib/capybara/poltergeist/client.rb:14:in start' from /Library/Ruby/Gems/2.3.0/gems/poltergeist-1.12.0/lib/capybara/poltergeist/driver.rb:44:in
client'
from /Library/Ruby/Gems/2.3.0/gems/poltergeist-1.12.0/lib/capybara/poltergeist/driver.rb:25:in browser' from /Library/Ruby/Gems/2.3.0/gems/poltergeist-1.12.0/lib/capybara/poltergeist/driver.rb:207:in
resize'
from /Library/Ruby/Gems/2.3.0/gems/webshot-0.1.0/lib/webshot/screenshot.rb:15:in initialize' from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/singleton.rb:142:in
new'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/singleton.rb:142:in block in instance' from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/singleton.rb:140:in
synchronize'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/singleton.rb:140:in instance' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:233:in
webshot'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:104:in snap!' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:85:in
snap'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:72:in block in crawl_and_snap' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:65:in
each'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:65:in crawl_and_snap' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:59:in
block in crawl'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:58:in times' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:58:in
crawl'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:34:in execute' from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:26:in
handle'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/bin/snapcrawl:6:in <top (required)>' from /usr/local/bin/snapcrawl:22:in
load'
from /usr/local/bin/snapcrawl:22:in <main>'
Yeah, not necessarily cryptic, just a messy Ruby backtrace :)
This is good - see the first or second line: Could not find an executable ["phantomjs"] on your path
Download it manually and place it in your path: http://phantomjs.org/download.html
Should just be a single binary called "phantomjs"
I wlil try to see why its not installing automatically.
Ok so added phantomjs to my path(usr/local/bin) - the operation got much further but still failed on a missing dependency. Not sure which gem has a dependency on mini-magick?
Snap! Snapping picture... null
accepted
null
cookiesAccepted
/Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:200:in `rescue in validate!': ImageMagick/GraphicsMagick is not installed (MiniMagick::Invalid)
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:197:in `validate!'
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:113:in `block in create'
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:112:in `tap'
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:112:in `create'
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:34:in `read'
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:90:in `block in open'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:37:in `open'
from /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/open-uri.rb:37:in `open'
from /Library/Ruby/Gems/2.3.0/gems/mini_magick-4.3.6/lib/mini_magick/image.rb:89:in `open'
from /Library/Ruby/Gems/2.3.0/gems/webshot-0.1.0/lib/webshot/screenshot.rb:73:in `capture'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:104:in `snap!'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:85:in `snap'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:72:in `block in crawl_and_snap'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:65:in `each'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:65:in `crawl_and_snap'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:59:in `block in crawl'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:58:in `times'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:58:in `crawl'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:34:in `execute'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/lib/snapcrawl/crawler.rb:26:in `handle'
from /Library/Ruby/Gems/2.3.0/gems/snapcrawl-0.2.4rc1/bin/snapcrawl:6:in `<top (required)>'
from /usr/local/bin/snapcrawl:22:in `load'
from /usr/local/bin/snapcrawl:22:in `<main>'
I was afraid of that... the screenshot library uses a bunch of dependencies with zero-to-minimal maintenance status...
According to this StackOverflow answer, running brew install graphicsmagick
should help.
Can you try?
Also - please install the latest snapcrawl:
gem install snapcrawl --version 0.2.4rc4
Changes in this version:
accepted... cookiesAccepted
output of the webshot gem should now be hidden.I hope this works.
On my machine, it seems to be working nicely, and the captures are actually usable (with the exception maybe of the homepage, which has this dynamic scroll animation - you will have to capture it manually I guess)
This is the output you should expect:
We have images!
Note: I can't run it as uown.co I have to https://www.uown.co due to redirects maybe this is something about my macs config?
`Sams-MacBook-Pro:~ sam$ snapcrawl go uown.co -d2
-----> Visit: http://uown.co Snap! Snapping picture... done Crawl! Extracting links...
RuntimeError redirection forbidden: http://uown.co -> https://www.uown.co/`
I am about to shoot out but I will try a full -d4 run later on today
Cool. About the redirects, maybe since it is a different phantomjs build, it behaves differently on a mac.
I am releasing it as a final 0.2.4 version.
@DannyBen Success. It crawled all the pages and took great screenshots. Only one as you said was the homepage due to the scrolling javascript causing it to look blank.
Excellent, glad we could sort this out. I am closing this ticket, but feel free to comment if there is anything else.
Hi @DannyBen,
I have tried to use your tool to crawl a site -
Everything is working as it should as far as I can tell. But the screenshots are not saving. I have tried specifying various paths and ran the command as a super user but to no avail.
Comand Line:
snapcrawl go https://www.website.co -d4
Result: Cycles through pages with the following for each picture.
Snap! Snapping picture... done Crawl: Page was cached. Reading subsequent URLs from cache
But nothing is saving as expected. I have looked in my caches but still no luck.
Have I misinterpreted the instructions or..? If I can get it working you will save me inordinate amounts of time.
Many Thanks,
Sam