codemanki / cloudscraper

--DEPRECATED -- 🛑 🛑 Node.js library to bypass cloudflare's anti-ddos page
MIT License
603 stars 141 forks source link

Can't make it work #229

Closed Eastkap closed 5 years ago

Eastkap commented 5 years ago

Struggling hard to make the library work, get captcha each time using a proxy, but on localhost. Since I noticed python's cfscrape works properly, I tried extracting the ciphers and forcing them when using this library but no luck.

Code snippet ```js const cloudscraper = require('cloudscraper') const proof = async ()=>{ var headers = { "Host": "www.sneakersnstuff.com", 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', "Accept-Encoding":"gzip, deflate,br", 'Accept-Language': 'en-US,en;q=0.5', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'Cookie': '__cfduid=daac4ce9acab3578ba44cff16e93999b01560202117; AntiCsrfToken=23d975ee81674fab9752795f5fe177c2; png.state=A0psztL6KeuNCd0yWjC8AtwAGC08s9PrldorNDW3iZbWOhPqnxR7Nd0uLzFECE6YVh6kjoRMn4HJ0rlefA6ojqckSyaEqHigQa1n7VZ6KJMgr+Ht; __cf_bm=7b878363d0c6704cf13894fdb9019f277e15db67-1560202117-1800-AQYCmd0hF37DcKBsgQhy17DPqOuLoEQzdWCG7M0fSo423qyUMSkcGw2SITuatC2ORFlNZvwQKLH9lnm8fOmQNVQ=; _gcl_au=1.1.1531330661.1560202123; _ga=GA1.2.764629557.1560202123; _gid=GA1.2.1750232963.1560202123; _gat=1; _gat_UA-1918066-1=1; _dc_gtm_UA-1918066-1=1' }; var options = { url:'https://www.sneakersnstuff.com/en/product/37690/nike-dbreak-undercover', //proxy:true, proxy:'http://46.32.228.122:3128', headers, //jar:true, gzip:true, simple:false, resolveWithFullResponse:true } let a = await cloudscraper(options) console.log(a.statusCode) } proof() ```
ghost commented 5 years ago

Hi @Eastkap,

I'm unable to reproduce it but that isn't unusual with this particular problem. Reproduced.

The cipher list control string that would be used by cfscrape is exactly as follows:

const ciphers = 'ECDHE+AESGCM:ECDHE+CHACHA20:DHE+AESGCM:DHE+CHACHA20:ECDH+AESGCM:DH+AESGCM:ECDH+AES:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!eNULL:!MD5:!DSS:!ECDHE+SHA:!AES128-SHA';

Note that both libraries include !ECDHE+SHA:!AES128-SHA since SHA1 in the TLS signature algo. ext. tend to cause CAPTCHA.

If you want to test your theory:

const cloudscraper = require('cloudscraper').defaults({ agentOptions: { ciphers } });
// Enable debug output
cloudscraper.debug = true;
ghost commented 5 years ago

When I test via ssllabs, SHA1 is in the signature algorithms. I tried removing all of the SHA1 ciphers which works but not for the signature algorithms. The problem is that we don't have a way to set those directly. :(

Only specifying a single cipher works: ~/Downloads/node-v11.5.0-linux-x64/bin/node --tls-cipher-list='ECDHE-RSA-AES128-GCM-SHA256' issue-229.js

So perhaps eliminating certain ciphers will help avoid the CAPTCHA once again...

However, this is only a side effect of indirectly setting the ciphers: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_cipher_list.html Since we can't set the signature algorithms (It's not exposed): https://www.openssl.org/docs/man1.1.0/man3/SSL_CTX_set1_client_sigalgs.html

You could create your own list instead of using node's: https://www.openssl.org/docs/manmaster/man1/ciphers.html

I recommend using a version of node that doesn't have that problem. Node.js compiles with openssl meaning your locked to the version of openssl it was compiled with and is more than likely why you're not experiencing this issue when using cfscrape.

ghost commented 5 years ago

I'm going to ask you to upgrade to Node.js v8 or later that was compiled in 2019.

Latest versions with backports:

It's better to use even number releases instead of odd: https://github.com/nodejs/Release#release-schedule

Dates, downloads, and changelogs: https://nodejs.org/en/download/releases/

If you want to stick with v11 than use the latest minor version Changelog: https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V11.md

I'm closing this now.

Eastkap commented 5 years ago

I updated rn to 12.4.0 and still no luck, keep you posted

Update: specifying --tls-cipher-list='ECDHE-RSA-AES128-GCM-SHA256' deals with the problem.

Update2: const cloudscraper = require('cloudscraper').defaults({ ciphers:'ECDHE-RSA-AES128-GCM-SHA256' }) does it too. Thank you @pro-src

ghost commented 5 years ago

Thanks for letting us know @Eastkap

Here is an explanation for why Cloudflare has introduced these changes and why SHA1 being a problem makes sense:

Quoting the draft linked above:

  1. Signature Algorithms

    Clients SHOULD NOT include MD5 and SHA-1 in signature_algorithms extension. If a client does not send a signature_algorithms extension, then the server MUST abort the handshake and send a handshake_failure alert

Cloudflare is doing us a rather annoying favor.

Eastkap commented 5 years ago

Hopefully it's all for security sake ofc. I had read your pinned issue but couldn't make anything out of it. Well I guess it's good to go now.

ghost commented 5 years ago

Hopefully it's all for security sake ofc.

I think that was their general idea but the behavior definitely improves their anti-scrape shield. So two birds one stone...

I had read your pinned issue but couldn't make anything out of it.

Yeh, unfortunately, that issue does nothing to solve the current problem. The relevance is that Cloudflare is sending CAPTCHA when the client connects in an insecure(deprecated) way. Since all major browsers will soon disable those insecure TLS versions, we can expect Cloudflare to flag clients based on their support for them. Essentially, that is what they're already doing and they're likely to improve it... Thus, we should disable the deprecated TLS versions for security now and for better browser spoofing later.

Well I guess it's good to go now.

Cloudflare is holding back and that's a fact. They have at least one bot-filter that is much more advanced and they're not even using it... I can only speculate as to why. I think it's clear that Cloudflare doesn't see tools like this as a threat. It's not a Cloudscraper versus Cloudflare thing. It's a Cloudflare tolerates Cloudscraper thing. All's well that ends well.

Cheers

Eastkap commented 5 years ago

This cipher appears to be clipped now. Haven't been able to found a replacement yet. Python module has the same issue

ghost commented 5 years ago

The problem doesn't occur for me on Node.js v12.4.0 or v11.5.0 with the URL from the OP when using the default configuration or the --tls-cipher-list flag.

ghost commented 5 years ago

Anyways, Feel free to see the cipher list that your web browser sends by visiting https://howsmyssl.com or https://www.ssllabs.com/ssltest/viewMyClient.html

Then head over to https://www.openssl.org/docs/manmaster/man1/ciphers.html to create a list.

Let me know if I can help.

Eastkap commented 5 years ago

issue is, once again, on my end, sorry @pro-src . Cloudflare seems to have captcha-flagged my proxies