ZFC-Digital / cf-clearance-scraper

This library was created for testing and training purposes to retrieve the page source of websites, create Cloudflare Turnstile tokens and create Cloudflare WAF sessions.
MIT License
281 stars 53 forks source link

Request time out, wait for navigation time out. #11

Closed cuongk9 closed 4 months ago

cuongk9 commented 4 months ago

this time it doesnt work w anysite, i use proxy also. it rarely works like 1/100 req

mdervisaygan commented 4 months ago

https://github.com/zfcsoftware/puppeteer-real-browser/issues/83#issuecomment-2215030551 Added shadowroot off Cloudflare. Bug has been fixed. Please test by updating to the latest version. zfcsoftware/cf-clearance-scraper:latest

cuongk9 commented 4 months ago

zfcsoftware/puppeteer-real-browser#83 (comment) Added shadowroot off Cloudflare. Bug has been fixed. Please test by updating to the latest version. zfcsoftware/cf-clearance-scraper:latest

thank you, it worked, but is there any method to improve the performance? with this running in headful mode, i cant run many browser instance at once due to memory shortage.

mdervisaygan commented 4 months ago

zfcsoftware/puppeteer-real-browser#83 (comment) Added shadowroot off Cloudflare. Bug has been fixed. Please test by updating to the latest version. zfcsoftware/cf-clearance-scraper:latest

thank you, it worked, but is there any method to improve the performance? with this running in headful mode, i cant run many browser instance at once due to memory shortage.

https://www.reddit.com/r/webscraping/comments/1dy2yvc/how_datadome_detects_puppeteer_extra_stealth/ You are welcome. There is, but in open source projects, these methods are quickly fixed as they are released. Even if I update, Cloudflare will fix that method with an update. For now, this method is more guaranteed. It has a few problems, I will fix them too.

Chrome tries to use as much of the available ram as it can. Cloudflare can use a lot of CPU while solving Captcha. When the two are combined, even a headless shell browser can reach 70% cpu limit. I suggest you to create 1 container for each browser and set cpu and ram limits. https://www.browserless.io/blog/observations-running-more-than-5-million-headless-sessions-a-week I will soon implement some of the suggestions here.