Request time out, wait for navigation time out.

cuongk9 commented 4 months ago

this time it doesnt work w anysite, i use proxy also. it rarely works like 1/100 req

mdervisaygan commented 4 months ago

https://github.com/zfcsoftware/puppeteer-real-browser/issues/83#issuecomment-2215030551 Added shadowroot off Cloudflare. Bug has been fixed. Please test by updating to the latest version. zfcsoftware/cf-clearance-scraper:latest

cuongk9 commented 4 months ago

zfcsoftware/puppeteer-real-browser#83 (comment) Added shadowroot off Cloudflare. Bug has been fixed. Please test by updating to the latest version. zfcsoftware/cf-clearance-scraper:latest

thank you, it worked, but is there any method to improve the performance? with this running in headful mode, i cant run many browser instance at once due to memory shortage.

mdervisaygan commented 4 months ago

zfcsoftware/puppeteer-real-browser#83 (comment) Added shadowroot off Cloudflare. Bug has been fixed. Please test by updating to the latest version. zfcsoftware/cf-clearance-scraper:latest

thank you, it worked, but is there any method to improve the performance? with this running in headful mode, i cant run many browser instance at once due to memory shortage.

https://www.reddit.com/r/webscraping/comments/1dy2yvc/how_datadome_detects_puppeteer_extra_stealth/ You are welcome. There is, but in open source projects, these methods are quickly fixed as they are released. Even if I update, Cloudflare will fix that method with an update. For now, this method is more guaranteed. It has a few problems, I will fix them too.

Chrome tries to use as much of the available ram as it can. Cloudflare can use a lot of CPU while solving Captcha. When the two are combined, even a headless shell browser can reach 70% cpu limit. I suggest you to create 1 container for each browser and set cpu and ram limits. https://www.browserless.io/blog/observations-running-more-than-5-million-headless-sessions-a-week I will soon implement some of the suggestions here.

ZFC-Digital / cf-clearance-scraper

Request time out, wait for navigation time out. #11