Closed bartman081523 closed 5 years ago
Thank you for your report.
Unfortunately this is a problem with spidr
, see https://github.com/postmodern/spidr/issues/66. The issue has been closed and a fix has been merged, however the author has not yet released a new version to Rubygems and there is no way I can depend on the GitHub master branch in this gem (it's not possible).
I've been thinking about potentially pushing my own patched version of spidr
to Rubygems, but haven't opted for that yet.
I might though (perhaps we could open an issue in the original GitHub repo politely asking Postmodern to release a new version first).
thank you for your concern. i also found the fix for spidr already here: https://github.com/postmodern/spidr/commit/ae885272619f74c69d43ec77852f158768c6d804
you could bundle the git version from spidr with Bundler see here https://bundler.io/v1.12/git.html at the .gemspec gem 'spidr', :git => 'https://github.com/postmodern/spidr.git'
Yeah I now that you can specify that in a Gemfile
, however what we need here is to add it to wayback_archiver.gemspec
and .gemspec
files do not support that.
From https://bundler.io/v1.12/git.html
Because RubyGems lacks the ability to handle gems from git [...]
ℹ️ Workaround
Explicitly add spidr
to your Gemfile
:
gem 'spidr', github: 'postmodern/spidr'
Traceback (most recent call last): 28: from /home/user/.gem/ruby/2.5.0/bin/wayback_archiver:23:in
<main>' 27: from /home/user/.gem/ruby/2.5.0/bin/wayback_archiver:23:in
load' 26: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/bin/wayback_archiver:81:in<top (required)>' 25: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/bin/wayback_archiver:81:in
each' 24: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/bin/wayback_archiver:82:inblock in <top (required)>' 23: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/lib/wayback_archiver.rb:50:in
archive' 22: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/lib/wayback_archiver.rb:91:incrawl' 21: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/lib/wayback_archiver/archive.rb:75:in
crawl' 20: from /home/user/.gem/ruby/2.6.0/gems/wayback_archiver-1.2.1/lib/wayback_archiver/url_collector.rb:37:incrawl' 19: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/spidr.rb:53:in
site' 18: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:274:insite' 17: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:355:in
start_at' 16: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:373:inrun' 15: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:665:in
visit_page' 14: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:599:inget_page' 13: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:788:in
prepare_request' 12: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:605:inblock in get_page' 11: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/agent.rb:679:in
block in visit_page' 10: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:238:ineach_url' 9: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:188:in
each_link' 8: from /home/user/.gem/ruby/2.6.0/gems/nokogiri-1.10.1/lib/nokogiri/xml/node_set.rb:237:ineach' 7: from /home/user/.gem/ruby/2.6.0/gems/nokogiri-1.10.1/lib/nokogiri/xml/node_set.rb:237:in
upto' 6: from /home/user/.gem/ruby/2.6.0/gems/nokogiri-1.10.1/lib/nokogiri/xml/node_set.rb:238:inblock in each' 5: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:189:in
block in each_link' 4: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:182:inblock in each_link' 3: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:239:in
block in each_url' 2: from /home/user/.gem/ruby/2.6.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:283:into_absolute' 1: from /usr/lib/ruby/2.6.0/uri/generic.rb:807:in
path=' /usr/lib/ruby/2.6.0/uri/generic.rb:753:in `check_path': path conflicts with opaque (URI::InvalidURIError)