FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

Not pulling all html from page? #417

Open JPeterson2015 opened 4 years ago

JPeterson2015 commented 4 years ago

Having an issue. I have the following code, im trying to get the pagination which there are 5 links however when I filter through it turns up empty. Very strange because in chrome dev tools I can see the links but when i dd($crawler) the markup for the page is incomplete.

Whats the deal here?

`$url = 'https://www.indeed.com/cmp/Lockheed-Martin/jobs';

    $crawler = Goutte::request('GET', $url);
    $data = $crawler->filter('.cmp-JobDisplay-pagination > .cmp-Pagination > a')->each(function ($node) {
        return $node->attr('href');
    });

    dd($data);`
sharmadhiraj commented 4 years ago

This is working for me.

$url = 'https://www.indeed.com/cmp/Lockheed-Martin/jobs';

$client = new Client();
$crawler = $client->request('GET', $url);

$data = $crawler->filter('.cmp-JobDisplay-pagination > .cmp-Pagination > a')
    ->each(function ($node) {
        return $node->attr('href');
    });

print_r($data);

Output: Array ( [0] => /cmp/Lockheed-Martin/jobs?start=150 [1] => /cmp/Lockheed-Martin/jobs?start=300 [2] => /cmp/Lockheed-Martin/jobs?start=450 [3] => /cmp/Lockheed-Martin/jobs?start=600 [4] => /cmp/Lockheed-Martin/jobs?start=150 )