FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

How to Crawl Ajax Site Data? #216

Closed rashedshaon closed 9 years ago

rashedshaon commented 9 years ago

I want crawl a site which pagination's are loaded by by ajax. Is it possible to do this with Goutte? I tried with, $link = $crawler->selectLink('Next>')->link(); $crawler = $client->click($link); but not working.

stof commented 9 years ago

You can't if the site is not usable without Javascript, as Goutte is not able to run Javascript

ssanders commented 9 years ago

Use Firebug or similar to see what the Ajax loads, then get that with Goutte.

stof commented 9 years ago

well, if the goal is to crawl the site, the right solution would be to disable JS in your browser and browse the site to see how they support the case without JS.

dunglas commented 9 years ago

You can also try JS-enabled libraries or (sort of) headless browsers such as PhantomJS, SlimerJS or ZombieJS.