canCrawl() returns wrong value for partial path matches without wildcards in robots.txt

chrisakroyd / robots-txt-parser

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

MIT License

12 stars 8 forks source link

Closed Trott closed 1 year ago

Trott commented 1 year ago

robots.txt:

User-agent: *
Disallow: /rss
Allow: /

canCrawl() thinks this means /rssa cannot be crawled but that is incorrect.

Trott commented 1 year ago

Argh, much to my surprise, this is not a bug but expected behavior according to spec. Closing. Sorry for the noise.

Trott commented 1 year ago

Ah, it is a bug, just not the one I originally described. Will open a separate issue.