Closed LeMoussel closed 7 years ago
isUrlAllow() fail when there is no path in Url.
$robotsTxtContentMultipleUA = " User-agent: * Allow: / User-agent: * Disallow: /?google_comment_id=* User-agent: * Disallow: /?replytocom=* User-agent: * Disallow: /*/?replytocom=* "; $parserRobotsTxt = new RobotsTxtParser($robotsTxtContentMultipleUA); $rulesRobotsTxt = $parserRobotsTxt->getRules(); $robotsTxtValidator = new RobotsTxtValidator($rulesRobotsTxt); $url1 = 'http://site.com/page2'; // Google Allow $url2 = 'http://site.com/?replytocom=32'; // Google Disallow $url3 = 'http://site.com/test/?replytocom=32'; // Google Disallow if ($robotsTxtValidator->isUrlAllow($url1)) { echo "$url1 => Allow".PHP_EOL; } else { echo "$url1 => Disallow".PHP_EOL; } if ($robotsTxtValidator->isUrlAllow($url2)) { echo "$url2 => Allow".PHP_EOL; } else { echo "$url2 => Disallow".PHP_EOL; } if ($robotsTxtValidator->isUrlAllow($url3)) { echo "$url3 => Allow".PHP_EOL; } else { echo "$url3 => Disallow".PHP_EOL; }
Result :
http://site.com/page2 => Allow http://site.com/?replytocom=32 => Allow http://site.com/test/?replytocom=32 => Allow
Should be:
http://site.com/page2 => Allow http://site.com/?replytocom=32 => Disallow http://site.com/test/?replytocom=32 => Disallow
Result with Google robots.txt Tester http://site.com/page2 => Allow
http://site.com/?replytocom=32 => Disallow
http://site.com/test/?replytocom=32 => Disallow
Error seem to be in RobotsTxtValidator::getRelativeUrl() with return parse_url($url, PHP_URL_PATH);
return parse_url($url, PHP_URL_PATH);
I test a patch. If that's OK I'd do a PR.
isUrlAllow() fail when there is no path in Url.
Result :
Should be:
Result with Google robots.txt Tester http://site.com/page2 => Allow
http://site.com/?replytocom=32 => Disallow
http://site.com/test/?replytocom=32 => Disallow