Closed LeMoussel closed 7 years ago
Hi, @LeMoussel Are you sure that it is expected behavior "Should be"? It look likes user-agents with empty rules:
User-agent: 008 User-agent: Accoona User-agent: aipbot
Do you have link on documentation how it should work?
Yes, This is checked in the Google search console For this example, Googlebot, Accoona, aipbot & aipbot* are url '/' disallow.
User-agent: Googlebot
User-agent: Accoona
User-agent: aipbot
User-agent: aipbot*
disallow: /
really, thanks. Then it is a bug which should be fixed.
This code fixe this bug
$parserRobotsTxt = RobotsDotText::withContent($robotsTxtContentTest);
$rulesRobotsTxt = $parserRobotsTxt->getRules();
// New ///////////////////////////////////////////////////////////////////////////
// Assigns rules to all user agents
$UA = array_keys(array_reverse($rulesRobotsTxt));
for ($i = 1; $i < count($UA); $i++) {
if (empty($rulesRobotsTxt[$UA[$i]])) {
$rulesRobotsTxt[$UA[$i]] = &$rulesRobotsTxt[$UA[$i-1]];
}
}
////////////////////////////////////////////////////////////////////////////////////
$robotsTxtValidator = new RobotsTxtValidator($rulesRobotsTxt);
I didn't realize PR because I don't know how to do that.
I think I have found a solution. Indeed I think it is necessary to put code in prepareRules()
like this :
/**
* Parse rules
*
* @return void
*/
public function prepareRules()
{
$contentLength = mb_strlen($this->content);
while ($this->char_index <= $contentLength) {
$this->step();
}
foreach ($this->rules as $userAgent => $directive) {
foreach ($directive as $directiveName => $directiveValue) {
if (is_array($directiveValue)) {
$this->rules[$userAgent][$directiveName] = array_values(array_unique($directiveValue));
}
}
}
// New ///////////////////////////////////////////////////////////////////////////
// Assigns rules to all user agents
$uaRules = array_keys(array_reverse($this->rules));
for ($i = 1; $i < count($uaRules); $i++) {
if (empty($this->rules[$uaRules[$i]])) {
$this->rules[$uaRules[$i]] = &$this->rules[$uaRules[$i-1]];
}
}
}
If you are OK, I can do PR.
User-agent: SEOkicks-Robot User-agent: Spiderlytics User-agent: ShopWiki User-agent: oBot/2.3.1 Crawl-delay: 60
Hi @LeMoussel, as i understand, here RobotsTxtParser should return same rules for these 4 user-agents. Because at the moment RobotsTxtValidator will know nothing about first 3 user-agents...
So, it seems need to improve RobotsTxtParser.
Fail for User-agent rule.
Result :
Should be: