The robots.txt handling in ldspider is authority(host)-unique, which is not how
it should be done, see e.g. [1]. This results in IllegalArgumentExceptions if
https URIs are checked for robots.txt allowance.
[1] https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Original issue reported on code.google.com by ruedige...@googlemail.com on 18 Jun 2012 at 2:16
Original issue reported on code.google.com by
ruedige...@googlemail.com
on 18 Jun 2012 at 2:16