JimmyLaurent / cloudflare-scraper

A package to bypass Cloudflare's protection
MIT License
286 stars 30 forks source link

Support remote Chrome #8

Closed K5HV closed 4 years ago

K5HV commented 4 years ago

Hi, is it possible to add support for remote Chrome? (ex: running in Docker)

I'm running cloudflare-scraper on headless machine (no X, no Chrome installed) and I'd like to solve Cloudflare puzzle, retrieve cookie and continue my scrapping using different tool. I was able to do it, but I needed to do some dirty code patching first.

JimmyLaurent commented 4 years ago

Can you tell me more ? What did you have to patch ? If you can override puppeteer options, is it enough ?

K5HV commented 4 years ago

Hi, sorry I could be more specific. This what I'm doing:

I have running chrome with remote debugger in docker (I'm using selenium-standalone-chromium-debug). Then I can discover websocker URL from: http://127.0.0.1:21222/json/version (21222 is mapped from container) Next (I was still using 42308d8f2d180eaff86d5798d6f8faf756d68371) by changing puppeteer.launch(puppeteerOptions) to puppeteer.connect({browserWSEndpoint: 'ws://127.0.0.1:21222/devtools/browser/...'}) in fillCookiesJar.js I was able to connect to chrome and play with website I want. Then after solving Cloudflare puzzle I can retrieve cookie and user-agent.

I didn't play with puppeteer alot so I don't know if there any options than be exported here. But creating cloudflareScraper with url paramater to remote dbg could be nice thing.

JimmyLaurent commented 4 years ago

I made the changes, you should be able to connect to your remote chrome in the last version (1.0.5).

let response = await cloudflareScraper.get('https://your-url.com', { browserWSEndpoint: 'http://127.0.0.1:21222/json/version'});