FriendsOfPHP / Goutte

Goutte, a simple PHP Web Scraper
MIT License
9.26k stars 1.01k forks source link

How get lazy data ? #442

Closed mcunha98 closed 2 years ago

mcunha98 commented 3 years ago

If I run a crawler under https://labiexames.com.br/testes I receive an initial DOM object (like a wireframe), but the real content will be delivery later via react/jscript and tags like title for example not is part of original DOM .

Any idea how resolve it ? $urlFinal = "https://labiexames.com.br/testes"; $client = new Client(HttpClient::create(['timeout' => 60])); $client->setServerParameter('HTTP_USER_AGENT', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0'); $crawler = $client->request('GET', $urlFinal);

If you compare with browser source code, you'll notice the big difference between crawler DOM and browser DOM

larowlan commented 3 years ago

You can't use goutte for that, you need to use something like webdriver where you have an engine that understands Javascript

mcunha98 commented 3 years ago

@larowlan thanks for reply, I'll check webdriver (in fact I did some tests with PHPPanther to begin...)