robots.txt handling is confused if there are http and https URIs on same domain

cif2cif / ldspider

Automatically exported from code.google.com/p/ldspider

0 stars 0 forks source link

robots.txt handling is confused if there are http and https URIs on same domain #22

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

The robots.txt handling in ldspider is authority(host)-unique, which is not how 
it should be done, see e.g. [1]. This results in IllegalArgumentExceptions if 
https URIs are checked for robots.txt allowance.

[1] https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

Original issue reported on code.google.com by ruedige...@googlemail.com on 18 Jun 2012 at 2:16

GoogleCodeExporter commented 9 years ago

Original comment by tob.kae...@gmail.com on 19 Jun 2012 at 9:29

Changed state: Started

GoogleCodeExporter commented 9 years ago

Original comment by tob.kae...@gmail.com on 19 Jun 2012 at 11:50

Changed state: Fixed