FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

BUGFIX: doRequest() method headers setting bug and setHeaders vs request server parameters ambiguity #316

Open KrzysztofPrzygoda opened 7 years ago

KrzysztofPrzygoda commented 7 years ago

Line 149 in Client.php (explanation in comments):

        foreach ($this->headers as $name => $value) {
            // BUG: This duplicates headers with different letter cases
            //$requestOptions['headers'][$name] = $value;
            // RESOLVES to:
            //$requestOptions['headers'][strtolower($name)] = $value;

            // But now rises the question: if headers set with setHeader() method should or shouldn't be overriden by headers provided to doRequest() method?
            // IMHO, request headers should override globally set headers by setHeader():
            $name = strtolower($name);
            if (!isset($requestOptions['headers'][$name])) $requestOptions['headers'][$name] = $value;

        }
KrzysztofPrzygoda commented 7 years ago

Here goes the issue case:

Headers provided to doRequest():

string(24) "Goutte\Client::doRequest"
array(6) {
  ["user-agent"]=>
  string(120) "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36"
  ["content-type"]=>
  string(48) "application/x-www-form-urlencoded; charset=UTF-8"
  ["accept"]=>
  string(16) "application/json"
  ["accept-encoding"]=>
  string(13) "gzip, deflate"
  ["referer"]=>
  string(31) "..."
  ["host"]=>
  string(14) "..."
}

Headers previously set with setHeader():

string(24) "Goutte\Client::doRequest"
array(7) {
  ["Accept"]=>
  string(85) "text/html,application/xhtml+xml,application/xml,application/json,image/webp,*/*;q=0.8"
  ["Accept-Encoding"]=>
  string(19) "gzip, deflate, sdch"
  ["Accept-Language"]=>
  string(23) "pl,en-US;q=0.7,en;q=0.3"
  ["Cache-Control"]=>
  string(62) "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"
  ["Connection"]=>
  string(5) "close"
  ["Dnt"]=>
  string(1) "1"
  ["User-Agent"]=>
  string(120) "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36"
}

Final headers sent to the GuzzleClient object in line 155:

string(24) "Goutte\Client::doRequest"
array(13) {
  ["user-agent"]=>
  string(120) "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36"
  ["content-type"]=>
  string(48) "application/x-www-form-urlencoded; charset=UTF-8"
  ["accept"]=>
  string(16) "application/json"
  ["accept-encoding"]=>
  string(13) "gzip, deflate"
  ["referer"]=>
  string(31) "..."
  ["host"]=>
  string(14) "..."
  ["Accept"]=>
  string(85) "text/html,application/xhtml+xml,application/xml,application/json,image/webp,*/*;q=0.8"
  ["Accept-Encoding"]=>
  string(19) "gzip, deflate, sdch"
  ["Accept-Language"]=>
  string(23) "pl,en-US;q=0.7,en;q=0.3"
  ["Cache-Control"]=>
  string(62) "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"
  ["Connection"]=>
  string(5) "close"
  ["Dnt"]=>
  string(1) "1"
  ["User-Agent"]=>
  string(120) "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36"
}
KrzysztofPrzygoda commented 7 years ago

OK, I've found that somewhat headers name normalization issue has been resolved in the current version (in set- removeHeader() methods) but there is still an issue which should take precedence: setHeader() or request headers?