amir-jakoby / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 0 forks source link

Use longest-match-wins approach to matching URLs in robots.txt #22

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
See "Order of precedence for group-member records" section at the end of 
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

Original issue reported on code.google.com by kkrugler...@transpac.com on 17 Mar 2013 at 6:43

GoogleCodeExporter commented 8 years ago
This was an ancient Nutch issue 
(https://issues.apache.org/jira/browse/NUTCH-98).

Original comment by kkrugler...@transpac.com on 17 Mar 2013 at 6:43

GoogleCodeExporter commented 8 years ago
Fixed as of r113 and r114

Original comment by kkrugler...@transpac.com on 13 Mar 2014 at 11:54