Closed Anorov closed 5 years ago
@pro-src, @Anorov PR https://github.com/Anorov/cloudflare-scrape/pull/242 is up and should hopefully fix the captcha problem. Would love some feedback. Also if you have a better idea for the adapter name, I'm all ears.
I actually think there is a couple of TLS related problems that is causing the CAPTCHA. The absence of TLSv1.3 ciphers aka old versions of openssl and the current issue with some SHA1 ciphers. I think we should address both. Addressing the former increases security and avoids a CAPTCHA in some instances, the latter resolves a CAPTCHA even when using the latest openssl in some instances.
The problem here is that some instances
is not very conclusive. We need more feedback. Also, if we could identify individual problematic ciphers that would be great. We've been unable to do that in a way that makes sense thus far.
Until then, I think !SHA1
is our best candidate since it doesn't appear to cause any problems. I'll continue to look into this.
@lukele I do think that we've narrowed it down to https://github.com/Anorov/cloudflare-scrape/issues/235#issuecomment-491641027 between ourselves. I wonder if this could be as simple as some SSLv3 option.
Do you mind trying to remove these TLSv1.0 ciphers?
('ECDHE-ECDSA-AES256-SHA', 'TLSv1.0', 256)
('ECDHE-RSA-AES256-SHA', 'TLSv1.0', 256)
('ECDHE-ECDSA-AES128-SHA', 'TLSv1.0', 128)
('ECDHE-RSA-AES128-SHA', 'TLSv1.0', 128)
DEFAULT_CIPHERS += ':!ECDHE-ECDSA-AES256-SHA:!ECDHE-RSA-AES256-SHA:!ECDHE-ECDSA-AES128-SHA:!ECDHE-RSA-AES128-SHA'
I'll fiddle in moment as I'm not sure whether or not to include the TLS
prefix.
@pro-src Sure, will try. With what versions of python are you seeing the problem and what sites are you testing against?
The ciphers listed are also covered with :!SHA1
, right? Or does that still include too many options?
I've only tested again my domain and https://ssllabs.com/ssltest/viewMyClient.html The latter has the handshake error on python 2 and 3.
The ciphers listed are also covered with :!SHA1, right? Or does that still include too many options?
Yes to both of those questions. Hopefully, we can be more specific without causing problems with SSLv3 but for right now, it works.
@lukele
The following works to remove the TLSv1.0 ciphers of recent discussion: DEFAULT_CIPHERS += ':!ECDHE+SHA'
and is more specific. Easiest way to test that is to modify the report.py
.
Here's mine: http://dpaste.com/1NJSY4B
Do you have a working proxy by any chance to test my changes? The problem is that we can't enable/disable the adapter on demand if we're relying on init_poolmanager
and proxy_manager_for
as these are invoked when the adapter is mounted.
Instead I've moved the logic into def get_connection(self, url, proxies=None):
def get_connection(self, url, proxies=None):
conn = super(CaptchaProvokingCiphersRemover, self).get_connection(url, proxies)
if self.is_enabled:
print("Insert custom SSL context")
conn.conn_kw['ssl_context'] = self.context_without_problematic_ciphers()
else:
print("Use default SSL context.")
return conn
It does work for non-proxied connection, but have yet to test with a proxy.
re. your report.py script. Just noticed that you modify the default DEFAULT_CIPHERS
list at the very beginning, and I'm now seeing No CAPTCHA encountered under normal conditions.
which technically is not correct. Does the report still contain the important information, though?
This Is my report: http://dpaste.com/22Y6P4Q
I find that to be an odd side effect, it should still contain the important information. If you look at lines around L113: https://gist.github.com/pro-src/17654ec3f949b0b17bd1a4aa1b4136b9 You'll see that the adapters aren't being mounted. I went ahead and modified it a tiny bit to avoid confusion.
@lukele No CAPTCHA when using DEFAULT_CIPHERS += ':!ECDHE+SHA'
? Just to be clear on what was tested.
I sent you an email with instructions and proxy credentials.
You can also use proxychains for a better test, just configure proxychains to use 127.0.0.1:1080
instead of the default.
brew install proxychains-ng
proxychains4 -q python testing.py
If you modify the DEFAULT_CIPHERS directly, any instances of ssl_context will be affected, due to the nature of python variables
The report was created using this version of report.py https://gist.github.com/pro-src/17654ec3f949b0b17bd1a4aa1b4136b9/4d7ba5c8593ef23ea6f4405cb855670d9d1a3d1d which modified the DEFAULT_CIPHERS directly. So even with no adapter in place, the SHA1 ciphers would have been eliminated (http://dpaste.com/22Y6P4Q). I just realized they were also created with a patched version of cfscrape, so never mind.
Newest report with your latest version non-patched cfscrape: http://dpaste.com/3EZ6VVE
Ah, I see but the DEFAULTCIPHERS were never being modified directly since strings in python are immutable and no assignment to `urlib3.util.ssl` takes place.
import urllib3
from urllib3.util.ssl_ import DEFAULT_CIPHERS
DEFAULT_CIPHERS += 'foobar';
print(urllib3.util.ssl_.DEFAULT_CIPHERS == DEFAULT_CIPHERS) # prints False
Did it trigger a CAPTCHA? I probably should modify it to include those details in the saved report rather than just in the shell. Sorry about that.
My bad, you are absolutely correct. I've mistaken it with the use of
urllib3.util.ssl_.DEFAULT_CIPHERS += 'foobar'
Yes, it did trigger a captcha
``` Checking GET request for https://pro-src.com DEFAULT Checking GET request for https://pro-src.com DEFAULT Checking GET request for https://pro-src.com DEFAULT Checking GET request for https://pro-src.com DEFAULT Cloudflare responded with CAPTCHA under normal conditions Checking to see which ciphers are shared as reported by https://howsmyssl.com Nothing unique was reported by https://howsmyssl.com Checking to see which ciphers are shared as reported by ssllabs The shared ciphers reported by ssllabs are not unique. The protocols details reported by ssllabs are not unique. Unique signature algorithms were detected by ssllabs. The named groups reported by ssllabs are not unique. Checking GET request for https://pro-src.com TLSv1.1 Checking GET request for https://pro-src.com TLSv1.1 No CAPTCHA encountered when using TLSv1.1 Checking GET request for https://pro-src.com DEFAULT !SHA1 Checking GET request for https://pro-src.com DEFAULT !SHA1 No CAPTCHA encountered when using !SHA1 The report was saved locally as "report.md" The report is valid Github flavored markdown, you may copy and paste it. The dpaste link (Expires in 10 days): http://dpaste.com/3EZ6VVE ```
:thinking: But you modified it to replace !SHA1
with !ECDHE+SHA
correct?
Checking GET request for https://pro-src.com DEFAULT !SHA1 Checking GET request for https://pro-src.com DEFAULT !SHA1 No CAPTCHA encountered when using !SHA1
So it didn't trigger a CAPTCHA when using !ECDHE+SHA
after all since !SHA1 really means !ECDHE+SHA
in this case?
Ah no, my bad. This was based on your original version.
Following report is with !ECDHE+SHA
MODIFIED_CIPHERS = DEFAULT_CIPHERS + ':!ECDHE+SHA'
Report: http://dpaste.com/2Z2DRXK
``` Checking GET request for https://pro-src.com DEFAULT Checking GET request for https://pro-src.com DEFAULT Cloudflare responded with CAPTCHA under normal conditions Checking to see which ciphers are shared as reported by https://howsmyssl.com Checking to see which ciphers are shared as reported by ssllabs The protocols details reported by ssllabs are not unique. Unique signature algorithms were detected by ssllabs. The named groups reported by ssllabs are not unique. Checking GET request for https://pro-src.com TLSv1.1 Checking GET request for https://pro-src.com TLSv1.1 No CAPTCHA encountered when using TLSv1.1 Checking GET request for https://pro-src.com DEFAULT !SHA1 Unique cipher list was shared with the server. Checking GET request for https://pro-src.com DEFAULT !SHA1 No CAPTCHA encountered when using !SHA1 The report was saved locally as "report.md" The report is valid Github flavored markdown, you may copy and paste it. The dpaste link (Expires in 10 days): http://dpaste.com/2Z2DRXK ```
So that means, confirmed. No captcha with !ECDHE+SHA
Cool so this seems to be the most specific we've had it yet. Cloudflare doesn't seem to like the absence of TLSv1.3 in some cases and the inclusion of TLSv1.0 in others. Seems like it just prefers secure settings which isn't a bad thing.
Hehe, yeah. Certainly a trade off we should be able to live with :) Considering that you never saw the captcha in the first place (or did you?), how did you figure out the "inclusion of TLSv1.0" part?
Yeah, I never received a CAPTCHA.
how did you figure out the "inclusion of TLSv1.0" part?
!ECDHE+SHA
is a shorthand for removing the TLSv1.0 ciphers in this https://github.com/Anorov/cloudflare-scrape/issues/235#issuecomment-491876966
Aah, the SSLv3 error, now I recall.
So finally. Newest version is up. Seems to even work with openssl < 1.1.1 and python2.7
Tested with the following versions:
Python 2.7.15
OpenSSL 1.0.2r 26 Feb 2019
Python 3.7.3
OpenSSL 1.1.1b 26 Feb 2019
This addresses all presently known issues and potentially prevents CAPTCHA with openssl <= 1.1.0
.
from urllib3.util.ssl_ import DEFAULT_CIPHERS
import ssl
TLS13_CIPHERS = ":".join([
"TLS13-AES-256-GCM-SHA384",
"TLS13-CHACHA20-POLY1305-SHA256",
"TLS13-AES-128-GCM-SHA256"
])
# Adjust the defaults to match those of more recent openssl versions
if ssl.OPENSSL_VERSION_NUMBER < 0x10101000 and "TLS13" not in DEFAULT_CIPHERS:
DEFAULT_CIPHERS = TLS13_CIPHERS + ":" + DEFAULT_CIPHERS
# This removes a few problematic TLSv1.0 ciphers
DEFAULT_CIPHERS += ":!ECDHE+SHA"
# This is how a user could disable it
import cfscrape
cfscrape.DEFAULT_CIPHERS = None
But I still have a couple of things that I want to look at before recommending this. Recommended.
Hmm... what's the reasoning behind adding the TLSv1.3 ciphers? For older versions of urllib3?
Do openssl versions < 1.1.1
support TLSv1.3 ciphers?
@lukele I believe so, just the default configuration improved. https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_cipher_list.html https://github.com/codemanki/cloudscraper/pull/212
Ah ok, just found this: https://wiki.openssl.org/index.php/TLS1.3
With my latest commits, a user could disable the custom ciphers using
from requests.adapters import HTTPAdapter
import cfscrape
scraper = cfscrape.create_scraper()
scraper.mount("https://", HTTPAdapter())
or we could make this configurable via create_scraper()
I'm currently -1 on adding keywords arguments for this purpose.
Actually, I still need to determine if the TLS13
prefix is understood only by openssl >= 1.1.1
. If so than the prefix should be (what I think is) the more universal way of specifying the same: TLS
.
aka TLS13
prefix VS. TLS
prefix.
I don't get what you are saying about TLSv1.3.
The API to set ciphers for TLSv1.3 is different (SSL_CTX_set_ciphersuites()
) than TLS <= v1.2 (SSL_CTX_set_cipher_list()
). By restricting TLS <= v1.2 ciphers via SSL_CTX_set_cipher_list()
, there is no impact whatsoever on TLSv1.3, it will keep using OpenSSL defaults.
There are no TLS13
prefixed ciphers and there is no such thing as TLS13-AES-256-GCM-SHA384
in OpenSSL.
As for urllib3: https://github.com/urllib3/urllib3/blob/master/src/urllib3/util/ssl_.py#L94
NOTE: TLS 1.3 cipher suites are managed through a different interface not exposed by CPython (yet!) and are enabled by default if they're available.
The prefix is taken from the DEFAULT_CIPHERS found in urllib3 and it seems that you get exactly what I'm saying as you seem to have just clarified exactly what I needed to determine. :smiley:
I didn't find that prefix in the openssl source either but since it ignores unknown entries in the cipher list control string, I haven't been exactly sure.
The question still kinda remains though. If cpython prefers that prefix, should we use it?
There is no point. Those bogus TLS13 ciphers have been removed from urllib3:
I'd have to check the cpython code base to determine that. The point is to enable the use of TLSv1.3 in versions of openssl prior to v1.1.1
. I don't think those ciphers were ever bogus. They may have become redundant with the latest version openssl but not bogus. It should be determined how the prefix affects or doesn't affect the usage of TLSv1.3 in openssl. Regardless the TLS
prefix is good to use here unless cpython doesn't handle the cipher list control string in the way I would assume. For example, Node.js handles the list and calls the appropriate function to handle TLSv1.3 ciphers or other TLS ciphers respectively. AKA I think you're making too many assumptions.
Here's the cpython change:
OpenSSL supports TLSv1.3 since 1.1.1
, not before. The TLS13 prefixed ciphers are a relict of the OpenSSL development within the 1.1.1 development tree, before TLSv1.3 ever hit a stable OpenSSL release.
OpenSSL supports TLSv1.3 since 1.1.1, not before. The TLS13 prefixed ciphers are a relict of the OpenSSL development within the 1.1.1 development tree, before TLSv1.3 ever hit a stable OpenSSL release.
That would make sense, do you mind sharing the source of that information?
I get what you're saying but I've observed that adding the TLSv1.3 ciphers to the control string has some effect in versions prior to v1.1.1. The effect was observed in Node.js, with the ciphers added, a user no longer received a CAPTCHA. See https://github.com/codemanki/cloudscraper/issues/211
So unless the older openssl source has been reviewed, I want to say it's an assumption.
I've glanced over the links that you shared. I don't see how that proves that there is no support for TLSv1.3 prior to v1.1.1
. I appreciate you sharing the information either way!
So I do believe that we've went full circle and landed back at the original question. The openssl source will have to be checked (again) unless somebody can provide the information.
From https://wiki.openssl.org/index.php/TLS1.3
The OpenSSL git master branch (and the 1.1.1-pre9 beta version) contain our development TLSv1.3 code which is based on the final version of RFC8446 and can be used for testing purposes (i.e. it is not for production use). Earlier beta versions of OpenSSL 1.1.1 implemented draft versions of the standard.
This at least sounds like it
I didn't get that you meant the former part of my statement, I thought your where doubting the second part of it.
This is not an assumption.
Here are some of links: https://www.openssl.org/blog/blog/2018/09/11/release111/
The headline new feature is TLSv1.3.
https://wiki.openssl.org/index.php/TLS1.3
The OpenSSL 1.1.1 release includes support for TLSv1.3. The release is binary and API compatible with OpenSSL 1.1.0. In theory, if your application supports OpenSSL 1.1.0, then all you need to do to upgrade is to drop in the new version of OpenSSL and you will automatically start being able to use TLSv1.3.
https://www.openssl.org/blog/blog/2017/05/04/tlsv1.3/
The forthcoming OpenSSL 1.1.1 release will include support for TLSv1.3.
RFC8446 was released in August 2018. OpenSSL 1.1.0 was released in August 2016, 2 years before. In 2018 different TLSv1.3 drafts where in the wild. This is not something that would be backported to OpenSSL 1.1.0 stable.
There is an easier way besides googling to determine this... Just add the ciphers and test with openssl 1.1.0 using ssllabs...
There is doubt here at all.
Okay, personally, I still want to work a few things out. I do have some doubt. If you'd read this issue: https://github.com/codemanki/cloudscraper/issues/211#issuecomment-488061663 Maybe you'll better understand my point of view. I'll settle for an actual test or reviewing the codebase of openssl. Thanks for your contribution to this issue.
Since it appears that OpenSSL 1.1.0 did have some kind of partial TLS1.3 support, I wonder what's to be gained in specifically trying to enable TLS1.3 with such versions? Why not leave default ciphers/cipher suite as it is, and only remove the ones causing actual problems?
@lukastribus I have a question about this https://github.com/Anorov/cloudflare-scrape/issues/235#issuecomment-492020494
Since that change is only about a unit test, was that the only TLSv1.3 change in cpython?
Since it appears that OpenSSL 1.1.0 did have some kind of partial TLS1.3 support, I wonder what's to be gained in specifically trying to enable TLS1.3 with such versions? Why not leave default ciphers/cipher suite as it is, and only remove the ones causing actual problems?
Avoiding the CAPTCHA in some cases when using openssl < 1.1.1
Do we know of any such case for cfscrape at the moment? Wouldn't it be possible that in the node's case, cloudscrape also checks for a specific cipher list which is known to be used in node (as we suspected shortly for urllib3) and that's why a slight modification of the cipher list helps?
1.0.2r also didn't show me a captcha.
Do we know of any such case for cfscrape at the moment?
Not particularly but then we haven't had anybody to test this yet.
As is known, I haven't been able to reproduce this at all. What do I think about this?
This addresses all presently known issues and potentially prevents CAPTCHA with openssl <= 1.1.0.
Potentially prevents is all I've actually ever said about adding those ciphers. I do agree that there could be some other explanation. I think we need more feedback, specifically somebody to test with openssl prior to v1.1.1 who can normally reproduce the CAPTCHA.
1.0.2r also didn't show me a captcha.
When using https://github.com/Anorov/cloudflare-scrape/issues/235#issuecomment-491997405 or how?
Scratch that. Just realized I do see a captcha in python2 with openssl 1.0.2r
In addition, there's a typo in my current pull request so the ciphers are not really removed (they are removed despite the typo), yet I don't see a captcha on pro-src.com
The latest version of cfscrape should not encounter captchas, unless you're using Tor or another IP that Cloudflare has blacklisted. If you're getting a captcha error, first please run
pip install -U cfscrape
and try again. If you're still getting an error, please leave a comment.Please put all captcha challenge-related issues here.
Please run the following to determine the OpenSSL version compiled with your Python binary and include the output in your comment:
(Or
python
instead ofpython3
if running on Python 2.)