amerkurev / scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
https://scrapper.dev
Apache License 2.0
173 stars 29 forks source link

Dynamic DOM Support? #11

Closed dr0id123 closed 7 months ago

dr0id123 commented 7 months ago

I've noticed this application is not processing (output) dynamic DOM, and therefore is not compatible with such sites. For example:

https://angular.io/about?group=Angular

Raw HTML -> Cannot find the word "puppies" Fully generated DOM (DOM created in the browser, e.g., Chome Dev Tools), -> You can search and find the word "puppies".

Headless browsers should be able to output dynamic DOM as html (e.g., selenium does this).

Am I missing something? It should be possible given real browsers are being used.

dr0id123 commented 7 months ago

Nevermind, got this sorted -- need to use the correct wait function for all the javascript to process.

edwardsmoses commented 3 weeks ago

Nevermind, got this sorted -- need to use the correct wait function for all the javascript to process.

Please, could you share which wait function you used? I'm trying to do the same. Thanks.