FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

Handling exceptions #285

Open 3zzy opened 8 years ago

3zzy commented 8 years ago

Although I'm using try/catch but it still ends with an error:

Error: Client error: `GET http://example.com/C42C9CA3` resulted in a `403 Forbidden` response:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "htt (truncated...)

This is what I have:

use Goutte\Client;
$HTTPconfig = [ "curl" => [
                  CURLOPT_TIMEOUT => 60,
                  CURLOPT_CONNECTTIMEOUT => 60,
                  CURLOPT_SSL_VERIFYPEER => false,
                ],
                ['http_errors' => false]
              ];
$HTTPclient = new \Goutte\Client;
$HTTPclient->setClient(new \GuzzleHttp\Client($HTTPconfig));
$HTTPclient->setHeader('user-agent', 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/20.0');

try {
  $crawler = $HTTPclient->request('GET', $url);
  $doc = $crawler->html();
} catch (Exception $e) {
  write($e->getMessage());
  continue;
}