bbc / wraith

Wraith — A responsive screenshot comparison tool
http://bbc-news.github.io/wraith/
Apache License 2.0
4.83k stars 356 forks source link

`spider_skips` property broken #401

Open paullew opened 8 years ago

paullew commented 8 years ago

I'm using the default spider.yaml from https://raw.githubusercontent.com/BBC-News/wraith/master/templates/configs/spider.yaml

I've tried running it with wraith installed locally on my mac, and also via the wraith docker image. Both fail, with different error messages.

On my mac locally:

$ wraith capture spider.yaml
Config validated. No serious issues found.
no paths defined in config, crawling from site root
creating new spider file
/Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `=~': type mismatch: String given (TypeError)
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `block in skip_link?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `any?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `skip_link?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:256:in `visit_link?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in `block in run'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in `delete_if'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in `run'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:92:in `block in crawl'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:83:in `initialize'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:90:in `new'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:90:in `crawl'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:18:in `crawl'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/spider.rb:69:in `spider'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/spider.rb:35:in `determine_paths'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/spider.rb:23:in `check_for_paths'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:36:in `check_for_paths'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:133:in `block in capture'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:28:in `within_acceptable_limits'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:130:in `capture'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/bin/wraith:5:in `<top (required)>'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/wraith:23:in `load'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/wraith:23:in `<main>'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/ruby_executable_hooks:15:in `eval'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/ruby_executable_hooks:15:in `<main>'

Running it via the wraith docker image:

$ docker run --rm -P -v ~/devel/resources/testing/wraith:/wraithy -w='/wraithy' bbcnews/wraith capture spider.yaml
/usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/spider.rb:64:in `spider': undefined local variable or method `wraith' for #<Wraith::Crawler:0x005648d0ca4be8> (NameError)
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/spider.rb:36:in `determine_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/spider.rb:24:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:36:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:134:in `block in capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:28:in `within_acceptable_limits'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:131:in `capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/bin/wraith:5:in `<top (required)>'
    from /usr/local/bin/wraith:23:in `load'
    from /usr/local/bin/wraith:23:in `<main>'
Config validated. No serious issues found.
no paths defined in config, crawling from site root
altV commented 8 years ago

I'm getting undefined local variable or method `wraith' for #Wraith::Crawler:0x005648d0ca4be8 (NameError) for default spider.yml as well, version 3.1.2

imagreenplant commented 8 years ago

Same here:

wraith capture configs/spider.yaml Config validated. No serious issues found. no paths defined in config, crawling from site root /Library/Ruby/Gems/2.0.0/gems/wraith-3.1.2/lib/wraith/spider.rb:64:inspider': undefined local variable or method wraith' for #<Wraith::Crawler:0x007fd2f1938610> (NameError)

Vexrm commented 8 years ago

Ditto. I can't add extra information, but am watching for more info.

trioni commented 8 years ago

I get the same thing. Same error as @imagreenplant

Dbuggerx commented 8 years ago

I'm also getting the same error as @imagreenplant. Is this solved?

ocrunch commented 8 years ago

Same Issue!

slimatic commented 8 years ago

Any thoughts on what this issue could be?

/usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:64:inspider': undefined local variable or method wraith' for #<Wraith::Crawler:0x0055cde740ed58> (NameError)

xinbin commented 8 years ago

I have the same error message as @slimatic , full dump:

/usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/spider.rb:64:in
 'spider': undefined local variable or method 'wraith' for # (NameError)
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/spider.rb:36:in 'determine_paths'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/spider.rb:24:in 'check_for_paths'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:36:in 'check_for_paths'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:134:in 'block in capture'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:28:in 'within_acceptable_limits'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:131:in 'capture'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/command.rb:27:in 'run'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in 'invoke_command'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor.rb:359:in 'dispatch'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/base.rb:440:in 'start'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/bin/wraith:5:in ''
    from /usr/local/bin/wraith:23:in 'load'
    from /usr/local/bin/wraith:23:in '
'
bjorndavis commented 8 years ago

I'm getting this as well:

C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/spider.rb:65:in `spider': undefined local variable or met
hod `wraith' for #<Wraith::Crawler:0x00000002e81ca8> (NameError)
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/spider.rb:38:in `determine_paths'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/spider.rb:24:in `check_for_paths'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:36:in `check_for_paths'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:134:in `block in capture'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:28:in `within_acceptable_limits'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:131:in `capture'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/bin/wraith:5:in `<top (required)>'
        from C:/Ruby22-x64/bin/wraith:23:in `load'
        from C:/Ruby22-x64/bin/wraith:23:in `<main>'
catchergeese commented 8 years ago

It fails for me too (default spider config yaml file, running in docker container):

$ wraith capture configs/spider.yaml
/usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:64:in `spider': undefined local variable or method `wraith' for #<Wraith::Crawler:0x0055808ac4a508> (NameError)
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:36:in `determine_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:24:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:36:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:134:in `block in capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:28:in `within_acceptable_limits'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:131:in `capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/bin/wraith:5:in `<top (required)>'
    from /usr/local/bin/wraith:23:in `load'
    from /usr/local/bin/wraith:23:in `<main>'

Are there any chances you are going to fix it in the near future?

kyleskrinak commented 8 years ago

3.2.1 does not address this issue on my system. I'm still seeing the error message:

.rvm/gems/ruby-2.2.0/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `=~': type mismatch: String given (TypeError)
        from /Users/x/.rvm/gems/ruby-2.2.0/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `block in skip_link?'
        from /Users/x/.rvm/gems/ruby-2.2.0/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `any?'
        etc…
ChrisBAshton commented 8 years ago

This error has been fixed in 3.2.1:

'spider': undefined local variable or method 'wraith' for # (NameError)

However, I can see that the original error in this issue is:

core.rb:298:in =~': type mismatch: String given (TypeError)

This issue has been closed in error. Re-opening.

Peter-Petrik commented 7 years ago

Experiencing this in 3.2.1

/var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in```=~': type mismatch: String given (TypeError)

In spider.yaml I commented out: - !ruby/regexp /^\/baz\// and I'm no longer seeing the error.

sembrat commented 6 years ago

For the folks encountering this issue, does the removal of non-regexp within the spider_skips fix this?

Some quick testing on my end noticed that anything processed within Ruby regexp was handled fine, whereas anything with a string of the path broke spider_skips.

To solve this, I had to encapsulate all my non-regexp strings to skip as strict string matches in regexp, which isn't exactly ideal for those (like me) who are awful at regexp syntax.

edurenye commented 5 years ago

This error is related to the web-spider framework that is using wraith called Anemone this framework in not maintained anymore, last commit was in 2012. I think we should replace it with Medusa a maintained fork of Anemone that has the same API, so should work without to much trouble.

ErroneousBosch commented 5 years ago

@sembrat

Yes, switching to regex does seem to correct the errors.