Closed OSVFreeDom closed 5 years ago
Hi @OSVFreeDom,
Thanks for taking the time to create a proper issue. It really helps. This seems to be the same problem as https://github.com/codemanki/cloudscraper/issues/229. Please read over that issue and let me know if I can be of more assistance.
As I mention in that issue, I'd really like to fix the problem but I don't currently have a concrete way of doing so that works for everybody.
Cheers
Thanks for your answer. I've try some ciphers but no changes... Do you have a temporaly fix to force using good cipher that not trigger CAPTCHA ?
EDIT : The script run for more than 1 week with exactly the same version of Node/Cloudscraper and it's stop working today (no code change on the script/server)
If the problem was the cipher list, I could recommend something but that's not the exact problem. I've gone into more detail in the linked issue.
FWIW, I don't get a CAPTCHA when using https://nodejs.org/download/release/v10.16.0/node-v10.16.0-linux-x64.tar.gz
const cloudscraper = require('cloudscraper')
cloudscraper.get('https://pro-src.com').then(console.log)
OpenSSL may behave differently depending on the CPU. I've never had much problem with the TLS related Cloudflare updates, others have not fared as well. :(
Recompiling to expose the necessary OpenSSL functions would be one way of addressing this issue. I might get around to providing some instruction on that, no promises.
I currently have this build of node and I get CAPTCHA... Thanks anyway
Yw and Sorry friend :(, please do check out the suggestions in that issue: https://github.com/codemanki/cloudscraper/issues/229#issuecomment-502481770
The OP did manage to get it working.
I don't understand why Pyhton with same version of openssl don't have this issue
Depending on the OS, Python is more than likely using the system's version of OpenSSL. Node.js compiles with OpenSSL meaning it uses it's own version. So try a different build of Node.js.
Check node -p process.versions
That's what I do, same bug with v12.6.0
{
node: '12.6.0',
v8: '7.5.288.22-node.14',
uv: '1.30.1',
zlib: '1.2.11',
brotli: '1.0.7',
ares: '1.15.0',
modules: '72',
nghttp2: '1.38.0',
napi: '4',
llhttp: '1.1.4',
http_parser: '2.8.0',
openssl: '1.1.1c',
cldr: '35.1',
icu: '64.2',
tz: '2019a',
unicode: '12.1'
}
It's only with using proxy, but same proxies on Python and Node but success rate is 100% for Pyhton and node 0%
Ah, so when not using a proxy everything works? If that's the case, this isn't likely to be a TLS problem. Well... Could you adjust the title? :smile:
Give this a go: https://github.com/codemanki/cloudscraper/issues/233#issuecomment-510342262
I've read this issue and adjust the headers not working for me.
Ah, you're overriding the headers... duh
Here is your fixed snippet:
var cloudscraper = require('cloudscraper');
var headers = Object.assign(cloudscraper.defaultParams.headers, { /* modify UA here */ });
cloudscraper.defaults({proxy:"http://163.172.171.125:80", headers })
.get('https://pro-src.com').then(function(body) {
console.log(body);
}).catch(function(err) {
console.log(err);
});
I apologize for not running your snippet initially :/
Same, trigger captcha (I try some proxy)... Python with same proxy not trigger captcha...
The reason you were experiencing a CAPTCHA was because of the way you were overriding the default headers that are needed to imitate the web browser. Please see the code snippet above.
Sorry for being short but this stuff is covered in the README and the examples. Good luck
I think you don't understand... Your snippet with a lot of proxy not work (in Python yes) The big problem is : Yesterday => All working fine Today => nothing change (code/server etc..) and not working with any proxy (in Python it works too)
var target = "https://WEBSITE.COM"
var proxyURL = "http://IP:PORT"
var cloudscraper = require('cloudscraper').defaults({ resolveWithFullResponse: true, proxy: proxyURL });
cloudscraper(target).then(function(resp) {
//some code
}).catch(function(err) {
console.log(err.message) // always captcha
})
Code tried with more than 10k proxies => 100% failed (5% dead proxy, 95% captcha) The same in Python => 93% success (5% dead proxy, 2% captcha)
PS : I don't override any headers (In the issue snippet it's for example, I should not have put this example sorry). My real code is this (see upper)
I can't reproduce the statistics you've given. I can't reproduce the problem with the proxy you shared. If you're not overriding the headers, I'm not quite sure what the problem could be but we're getting a lot reports about proxies not working so I don't think this is an isolated issue. Something has changed...
Yes something has changed but nothing in my side on Cloudflare side perhaps. The script run a lot of time by day and without any reasons it's stop working (I don't have exact hour sorry)
@GoogleSites, @dpalade06, @izidan, @brunogaspar
Let's move all of the proxy related conversation here.
Does anybody else get the same results as @OSVFreeDom when running the script from https://github.com/codemanki/cloudscraper/issues/233#issuecomment-510342262
In console.log you can see all request trigger captcha expect dead proxies
Node version: v11.12.0
I guess i have the exact same behaviour.
Haven't spent much time figuring things out yet, had other stuff to work on.
wget https://nodejs.org/download/release/v11.12.0/node-v11.12.0-linux-x64.tar.gz
tar xzvf node-v11.12.0-linux-x64.tar.gz
./node-v11.12.0-linux-x64/bin/node issue-233.js
The only thing that I imagine to be different between our systems is the TLS because OpenSSL defaults might vary depending on CPU. It's the best explanation that I have at the moment given everything that CF has been doing with TLS lately. It only really makes sense if CF is performing additional checks when they encounter a known proxy. Because you guys don't get CAPTCHA when not using proxies, right???
Interesting though, I do seem to get more CAPTCHA responses on this version of node. Shooting in the dark, would you guys try a couple of different ciphers as mentioned here: https://github.com/codemanki/cloudscraper/issues/229#issuecomment-502483820
Because you guys don't get CAPTCHA when not using proxies, right???
Yup, i rarely get captcha when i don't use proxies.
Interesting though, I do seem to get more CAPTCHA responses on this version of node.
I can try different versions of Node later today and see where it lands me.
Shooting in the dark, would you guys try a couple of different ciphers as mentioned here: #229 (comment)
I tried that, kinda the same result unfortunately.
I'm not having much luck on my end... Reading https://tools.ietf.org/html/rfc8446#section-8.3 through https://tools.ietf.org/html/rfc8446#section-9.3 and wondering if CF is timing the TLS negotiation as a way to spot proxies.
Each of the following does yield passing results:
~/Downloads/node-v11.12.0-linux-x64/bin/node --tls-cipher-list='ECDHE-ECDSA-AES128-GCM-SHA256' issue-233.js
~/Downloads/node-v11.12.0-linux-x64/bin/node --tls-cipher-list='ECDHE-ECDSA-AES256-GCM-SHA384' issue-233.js
~/Downloads/node-v11.12.0-linux-x64/bin/node --tls-cipher-list='ECDHE-ECDSA-CHACHA20-POLY1305' issue-233.js
It's work for me ! Thanks 👍
These are my results
node --tls-cipher-list='ECDHE-ECDSA-AES128-GCM-SHA256' issue-233.js > results.txt
node --tls-cipher-list='ECDHE-ECDSA-AES256-GCM-SHA384' issue-233.js > results.txt
node --tls-cipher-list='ECDHE-ECDSA-CHACHA20-POLY1305' issue-233.js > results.txt
Without any proxy (just removed the proxy from being passed on the cloudscraper)
So i guess this is more tricky than it looks heh
Those are the results that I'd expect to see from random public proxies if everything is working properly. What was confusing about this is that I jumped to the conclusion of it being TLS related but then dismissed it because normally, you'd always be getting a CAPTCHA regardless of whether or not you're using a proxy. The only reason I revisited it was because nothing else made sense.
What changed? It seems to be the case that Cloudflare is only performing some TLS based checks when they detect a proxy otherwise you'd get a CAPTCHA without the proxy. The reason I've came to that conclusion is because the TLS should be the same even when using a proxy.
So the problem is the cipher list? No, I don't think so. I'm using the same cipher list on the same version of node, same OpenSSL version, etc. I don't get the CAPTCHA when using a proxy or otherwise.
What is the problem? The signature algorithms extension is different on your systems. Changing the cipher list has the side effect of modifying the signature algorithms. It's a side effect. We don't have a way to directly tell OpenSSL which signature algorithms to use because that function isn't exposed.
Solution? Find which ciphers we need to omit from the cipher list and/or only specify a single cipher to rule everything else out.
Ahem, update the library? We're already filtering some ciphers out of the list. Removing good ciphers isn't a good idea since this problem doesn't affect everybody. Removing them would affect compatibility with at least non-Cloudflare sites and is very hacky since the cipher list isn't directly responsible.
Who is affected? Not everybody. I'm not really sure what conditions are exactly responsible for causing OpenSSL to behave differently. It's probably CPU features. For those who don't have this problem with cfscrape
(python), you might try to find a version of node that was compiled with the same version of OpenSSL as python is using.
If the suggestions above don't work for any proxies, lmk in a new issue.
This is the relevant Node.js issue: https://github.com/nodejs/node/issues/24818
Please attempt to answer the following questions before submitting a new issue:
4.1.2
10.16
Today (2019/07/11)
All time
Many URL including https://pro-src.com When I try to get an URL with proxy, I always get a captcha. I've try with Python cloudflare-scrape library and returning the body content without any problems. When I try with Python I use exactly the same proxy/User-Agent and same destination URL I've make 50 tries with each (node/Python) Python => 50 success Node => 0 success (50 captcha)
Please share a minimal working code snippet that reproduces the problem.
Code snippet
```js var headers = { 'User-Agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36", }; var cloudscraper = require('cloudscraper'); cloudscraper.defaults({proxy:"http://163.172.171.125:80", headers:headers}).get('https://pro-src.com').then(function(resp) { console.log(resp.body.toString('utf-8')); }).catch(function(err) { console.log(err); }); ``` ```python import cfscrape scraper = cfscrape.create_scraper() print (scraper.get("https://pro-src.com", proxies={'https': 'http://163.172.171.125:80'}, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36'}).content) ```EDIT : I've made test with some differents proxy and User-Agent, each time I test with each script (node/Python)
Python return body all time and node never (allways captcha)