ekalinin / robots.js

Parser for robots.txt for node.js
MIT License
66 stars 21 forks source link

Ignoring rules without * #16

Closed SoAG closed 10 years ago

SoAG commented 10 years ago

It seems to me that rules that do not contain a * are not being applied to given url. So it ignores rules like Disallow: /dontcrawl/. Is there a reason for that?

SoAG commented 10 years ago

@ekalinin can you give some input on this. Thanks

ekalinin commented 10 years ago
@ekalinin can you give some input on this. Thanks

Hi, @SoAG! Can you provide some example about it? Because tests are performed without errors.

SoAG commented 10 years ago

Hi @ekalinin

Thanks for helping. Here's a example:

    parser = new robots.RobotsParser
    parser.setUrl 'http://www.faz.net/robots.txt', (parser, success)->
      if success
        parser.canFetch '*', 'http://www.faz.net/membership/', (access) ->
          console.log access

that returns true for the given url but it is a disallowed in the provided robots.txt. I think the relevant line is this one https://github.com/ekalinin/robots.js/blob/master/lib/rule.js#L48 it only allows applies the rule if there's a * in it or am I doing something wrong?

ekalinin commented 10 years ago

Try this, please:

    parser = new robots.RobotsParser
    parser.setUrl 'http://www.faz.net/robots.txt', (parser, success)->
      if success
        parser.canFetch '*', '/membership/', (access) ->
          console.log access
SoAG commented 10 years ago

Yeah that works! Thanks a lot, next time I'll have a better look at the tests :-)