Closed elmofromok closed 7 years ago
The exact same error is happening for me and I am only running against current production domain. I tried using both the capture.yaml and the spider.yaml config files with same result.
I am have this same issue with the default spider.yaml config files.
DEBUG: #################################################
DEBUG: Command run: spider configs/spider.yaml
DEBUG: Wraith version: 4.0.0
DEBUG: Ruby version: ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-darwin16]
DEBUG: ImageMagick: Version: ImageMagick 6.9.6-8 Q16 x86_64 2016-12-12 http://www.imagemagick.org
DEBUG: PhantomJS version: 2.1.1
DEBUG: CasperJS version: 1.1.2
DEBUG: #################################################
Same here
wraith spider configs/spider.yaml
DEBUG: #################################################
DEBUG: Command run: spider configs/spider.yaml
DEBUG: Wraith version: 4.0.0
DEBUG: Ruby version: ruby 2.0.0p648 (2015-12-16 revision 53162) [universal.x86_64-darwin15]
DEBUG: ImageMagick: Version: ImageMagick 6.9.5-4 Q16 x86_64 2016-07-30 http://www.imagemagick.org
DEBUG: PhantomJS version: 2.1.1
DEBUG: CasperJS version: CasperJS not installed
DEBUG: #################################################
Config validated. No serious issues found.
ERROR: unable to find referenced imported config "spider_paths.yaml"
It would appear that this commit was intended to fix this very same issue: https://github.com/BBC-News/wraith/commit/a8c968a830d15f60e0044819266700f18fb42a20
Adding a spider_paths.yaml file in the configs
folder with the following content solves the issue:
paths:
home: /
Thank you @pcambra It solved the issue for me.
Fails for me with following:
wraith spider spider.yaml
DEBUG: #################################################
DEBUG: Command run: spider spider.yaml
DEBUG: Wraith version: 4.0.0
DEBUG: Ruby version: ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]
DEBUG: ImageMagick: Version: ImageMagick 6.7.7-10 2016-11-29 Q16 http://www.imagemagick.org
DEBUG: PhantomJS version: 1.9.0
DEBUG: CasperJS version: CasperJS not installed
DEBUG: #################################################
Config validated. No serious issues found.
Crawling https://www.fdic.gov
/var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in =~': type mismatch: String given (TypeError) from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in
block in skip_link?'
from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in each' from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in
any?'
from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in skip_link?' from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:256:in
visit_link?'
from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in block in run' from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in
delete_if'
from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in run' from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:92:in
block in crawl'
from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:83:in initialize' from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:90:in
new'
from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:90:in crawl' from /var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:18:in
crawl'
from /var/lib/gems/1.9.1/gems/wraith-4.0.0/lib/wraith/spider.rb:25:in crawl' from /var/lib/gems/1.9.1/gems/wraith-4.0.0/lib/wraith/cli.rb:45:in
block in spider'
from /var/lib/gems/1.9.1/gems/wraith-4.0.0/lib/wraith/helpers/utilities.rb:4:in within_acceptable_limits' from /var/lib/gems/1.9.1/gems/wraith-4.0.0/lib/wraith/cli.rb:42:in
spider'
from /var/lib/gems/1.9.1/gems/thor-0.19.4/lib/thor/command.rb:27:in run' from /var/lib/gems/1.9.1/gems/thor-0.19.4/lib/thor/invocation.rb:126:in
invoke_command'
from /var/lib/gems/1.9.1/gems/thor-0.19.4/lib/thor.rb:369:in dispatch' from /var/lib/gems/1.9.1/gems/thor-0.19.4/lib/thor/base.rb:444:in
start'
from /var/lib/gems/1.9.1/gems/wraith-4.0.0/bin/wraith:5:in <top (required)>' from /usr/local/bin/wraith:23:in
load'
from /usr/local/bin/wraith:23:in `
I commented this section out of spider.yaml and now it works for - future reference I noticed in issue #401 that this was similar issue:
This should now be fixed in v4.0.1.
@ChrisBAshton, I just installed version 4.0.1 using brew, but to get spider working, I had to add config/spider_paths.yml (thank you, @pcambra) and comment out the spider_skips (thank you, @mramitanand), so this issue does not seem to be fixed.
I am trying to run
wraith spider
and have it build my paths file, but it is failing when it is not able to find thespiders_paths.yaml
file.I see this error:
It should create this file if it does not exist, correct?
Reporting a problem? Please describe the issue above, and complete the following checklist so that we can help you more quickly.
Issue checklist:
[X] I have validated my config file against YAML Validator to make sure it is valid YAML.
[X] I have run the
wraith info
command and pasted the output below:verbose: true
to my config) and pasted the output below: