Closed Anorov closed 5 years ago
$ python3 -c 'import ssl; print(ssl.OPENSSL_VERSION)'
OpenSSL 1.1.1b 26 Feb 2019
@pro-src Despite having OpenSSL 1.1.1, I'm getting a captcha every time I try to do a scraper.get("https://pro-src.com")
. I had the same issue with an older version of OpenSSL. I'm on a normal residential connection, and never experienced any captcha issues using cloudflare-scrape in the past.
It appears Cloudflare's recently started doing much more aggressive anomaly and bot detection. They may be checking for discrepancies between the network traffic (TCP, SSL) fingerprints and the ostensible user-agent's known legitimate fingerprints, among other things.
When I have some more free time, I'll dig into this as well.
@Anorov @lukele and anybody else who is experiencing this issue.
I'm unable to reproduce this so I'm going to need you guys to generate some reports with this script.
This must be ran from within the cloned cloudflare-scrape repo if you haven't pip installed cfscrape
git clone https://gist.github.com/pro-src/17654ec3f949b0b17bd1a4aa1b4136b9 temp
cp temp/report.py report.py
python report.py
My report: http://dpaste.com/38GSGJM Updated: http://dpaste.com/0RAENRJ
@pro-src thanks, just ran it. Seeing
$ python3 report.py
Already reported ID detected.
If I disable mounting of the CustomAdapter I'm seeing the CaptchaError again.
Unfortunately, it means your report is exactly the same as mine and nothing unique was identified. The CustomAdapter is checking to see if the problem occurs when using TLSv1.1 which it doesn't. I'll need to update the script to use ssllabs since it has better inspections. The clientHello extensions are still the greatest suspect.
Hmm... seeing that the user-agent is still set randomly, we should use a single one for these tests, to eliminate any differences. It might not be related to that, but can‘t hurt.
Just realized, of course you thought of that.
@lukele I've updated the script to address the issues that I mentioned but I expect the same result. If we get the same result, I'll update it to use ssllabs since it has better inspections.
Typical output if not unique: https://gist.github.com/pro-src/5e603aee2fc8d183624be6d3fda2b7eb
Sure looks like same result. Only difference is
Cloudflare responded with CAPTCHA under normal conditions
which is to be expected.
@pro-src Seeing that my latest Chrome version that supports TLS1.3 also uses http/2 for some connections (Cloudflare ESNI Checker - https://73af10a0-12b8-44bb-a685-6814f3c71e76.encryptedsni.com/cdn-cgi/trace
). So in order to mimic Chrome as closely as possible, I've mounted a HTTP2 adapter (from hyper
) for requests in a test script, and lo and behold, no more captcha challenge. The cipher list however seems to remain the same
import cfscrape
from collections import OrderedDict
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'
headers = OrderedDict(
(
("Host", None),
("Connection", "keep-alive"),
("Upgrade-Insecure-Requests", "1"),
("User-Agent", user_agent),
(
"Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
),
("Accept-Encoding", "gzip, deflate"),
("Accept-Language", "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"),
)
)
from hyper.contrib import HTTP20Adapter
scraper = cfscrape.create_scraper()
scraper.mount('https://', HTTP20Adapter())
scraper.get("https://pro-src.com", headers=headers)
Not sure what to make of it yet.
:thinking: Me neither. If that was solely the problem, wouldn't we expect everybody to be getting a CAPTCHA? It could be a reasonable work around for anybody who does have this problem though. Does the adapter fallback to HTTP 1.1 when HTTP 2 isn't supported? Does switching the adapter on and off whenever solving the challenge work to bypass the CAPTCHA?
For example:
from requests.adapters import HTTPAdapter
custom = HTTPAdapter()
original = scraper.get_adapter('https://')
# These should be the same by default
assert scraper.get_adapter('http://') is original
scraper.mount('https://', custom)
scraper.mount('http://', custom)
assert scraper.get_adapter('https://') is custom
assert scraper.get_adapter('http://') is custom
# Switch back
# scraper.mount('https://', original)
# scraper.mount('http://', original)
# I'm only showing an alternative to calling `mount` here
scraper.adapters.update({ 'https://': original, 'http://': original })
Before looking into that, two questions:
1.) Do you have pyopenssl installed? 2.) What is the output of the following code
import urllib3.contrib.pyopenssl
print(urllib3.contrib.pyopenssl)
Python 2 and 3
>>> import urllib3.contrib.pyopenssl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 46, in <module>
import OpenSSL.SSL
ModuleNotFoundError: No module named 'OpenSSL'
Ah ok, thanks. So that's the same as well.
The script has been updated to check ssllabs which provides protocol details such as the signature algorithms that are being used. I've updated my report for comparison: http://dpaste.com/0RAENRJ
While I could be wrong, I think that if the test yields the same results for ssllabs then it's likely not the TLS/SSL causing the problem or the IP's that I'm using are exempt from the checks. If the results are the same and it is something to do with the TLS/SSL, IDK that we can narrow it down without recompiling openssl. So, I'm really hoping that something shows...
Also if you edit the script to remove the known hashes, you'll generate a full report even if it's not unique.
I am in fact suspecting that you are whitelisted somehow. But you mentioned you performed tests from different hosts, do I recall that correctly?
hyper
uses the default ciphers from the ssl module. If I replace urllib3s ciphers with those from hyper
(so the default ciphers from the ssl module) I'm not seeing the captcha.
If I remove ECDHE-RSA-AES256-GCM-SHA384 from the urllib3s cipher list, I don't see the captcha any more... 🤷♂
Ran your report script. there's a very subtle difference. SHA1 in signature algorithms: SHA1/ECDSA, SHA1/RSA, SHA1/DSA
Yep, I'm tunneling through heroku to test as well.
If I remove ECDHE-RSA-AES256-GCM-SHA384 from the urllib3s cipher list, I don't see the captcha any more... :man_shrugging:
A single cipher removed from the list fixes this. :thinking: Maybe it isn't the particular cipher but the list as a whole that Cloudflare is flagging?
ECDHE-RSA-AES256-GCM-SHA384 is used in chrome as well. You'll see TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 in this screenshot:
I think maybe we should just brute force it. I mean, that would identify how this is working wouldn't it? lol
@lukele I'll whip up something later on if you want to run it for us? :P
A single cipher removed from the list fixes this. 🤔 Maybe it isn't the particular cipher but the list as a whole that Cloudflare is flagging?
That's what I've been thinking before. That urllib3 has a signature in that sense. But I can't remove any cipher. So for example, removing any of the first 4 (TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-ECDSA-AES256-GCM-SHA384
) doesn't change a thing.
ECDHE-RSA-AES256-GCM-SHA384 is used in chrome as well.
Haha, I check the same as well when I found the cipher :)
I'll try to remove ciphers from the end of the list, which are weaker ciphers anyway. Still would love to know why, a) you are not seeing the same results and b) what the real reason is.
Because Cloudflare loves me :rofl: Seriously though, I want to know as well.
Ran your report script. there's a very subtle difference. SHA1 in signature algorithms: SHA1/ECDSA, SHA1/RSA, SHA1/DSA
I just saw this, thanks.
Alright, *** it. Removing this cipher AES128-SHA
from the list, would be enough on my system. Should we just add an adapter that removes this cipher and see how many still report captchas?
@lukele Let's see if this solves it, SHA1 is insecure anyway.
import cfscrape
import urllib3
from requests.adapters import HTTPAdapter
from urllib3.util.ssl_ import create_urllib3_context, DEFAULT_CIPHERS
DEFAULT_CIPHERS += ':!SHA1'
# urllib3.util.ssl_.DEFAULT_CIPHERS = DEFAULT_CIPHERS
class CustomAdapter(HTTPAdapter):
def init_poolmanager(self, *args, **kwargs):
ctx = create_urllib3_context(ciphers=DEFAULT_CIPHERS)
super(CustomAdapter, self).init_poolmanager(*args, ssl_context=ctx, **kwargs)
scraper = cfscrape.create_scraper()
scraper.mount('https://', CustomAdapter())
print(scraper.get('https://pro-src.com').content)
Sure does!
Alright, *** it. Removing this cipher AES128-SHA from the list, would be enough on my system. Should we just add an adapter that removes this cipher and see how many still report captchas?
If nothing else avails, I think we should make a switch. I'm not sure about the on/off by default....
@lukele I'm going to modify the script so we can see exactly how this affects the report.
Could you share your cipher list once again, so the output of the following script
from urllib3.util.ssl_ import create_urllib3_context
ctx = create_urllib3_context(ssl.PROTOCOL_SSLv23)
print([c['name'] for c in ctx.get_ciphers()])
I just used pprint.pprint
instead of print
. I think the difference in the signature extension is based on CPU. It would be nice if we could identify the CPU's that cause this problem as well.
Intresting, for some reason it appears that my system is sending the SHA1 options and yours is not, even though the same cipher set is given.
I think openssl defaults are based on CPU, don't quote me though. :smiley:
Ah interesting, I have a Core i5 8th generation (Amber Lake-Y)
It looks like openssl can be build without sha1. That would explain it as well. Are you on a linux system?
Yup, always on *nix. I never use anything else. I updated the script and it changed my cipher list: http://dpaste.com/3BHAPM7 I'm not sure what the diff is yet but the diff was only reported by the socket.
What are you seeing for
openssl version
openssl ciphers
Version should be the same (OpenSSL 1.1.1b 26 Feb 2019
) but I reckon ciphers might be different
Ok, so ciphers are the same as mine.
The report also shows no difference, but disabling SHA1 with !SHA1
solves the captcha problem (for me). Certainly a workaround we should be able to live with, even if it modifies the cipher suite
Alright so the diff is all of the SHA1 cipher suites are removed as reported by the socket only?
('ECDHE-ECDSA-AES256-SHA', 'TLSv1.0', 256)
('ECDHE-RSA-AES256-SHA', 'TLSv1.0', 256)
('DHE-RSA-AES256-SHA', 'SSLv3', 256)
('DHE-DSS-AES256-SHA', 'SSLv3', 256)
('ECDHE-ECDSA-AES128-SHA', 'TLSv1.0', 128)
('ECDHE-RSA-AES128-SHA', 'TLSv1.0', 128)
('DHE-RSA-AES128-SHA', 'SSLv3', 128)
('DHE-DSS-AES128-SHA', 'SSLv3', 128)
('AES256-SHA', 'SSLv3', 256)
('AES128-SHA', 'SSLv3', 128)
This doesn't affect the ciphers that are being shared when using openssl 1.1.1b
so what to do?
Remove only AES128-SHA
or all SHA1?
@lukele The ssllabs report didn't change when using the updated script? Nvm, I failed to update it properly, give me one sec... lol
Only the most recent version includes the Shared ciphers as reported by the socket
information, which the previous versions unfortunately didn't. Signature algorithms as reported by ssllabs
are identical in all reports.
Okay, I fixed the script. Sorry about that.
Here is my updated report: http://dpaste.com/0D2047F
You can copy and paste my hashes into the script's known
list if you want to confirm a match.
@lukele Actually !AES128-SHA
will match multiple cipher suites but a few less than !SHA1
.
@Anorov Thoughts?
I'd go with least invasive and only remove AES128-SHA
at first. We can still add to the list later on.
So basically such an adapter should do:
from urllib3.util.ssl_ import create_urllib3_context
class CustomCiphers(HTTPAdapter):
def __init__(self, ciphers_to_remove, *args, **kwargs):
self.ciphers_to_remove = ciphers_to_remove
super(CustomCiphers, self).__init__(*args, **kwargs)
def init_poolmanager(self, *args, **kwargs):
ctx = create_urllib3_context()
ciphers = [cipher['name'] for cipher in ctx.get_ciphers() if cipher['name'] not in self.ciphers_to_remove]
ctx.set_ciphers(":".join(ciphers)
super(StopBreaking, self).init_poolmanager(*args, ssl_context=ctx, **kwargs)
scraper = cfscrape.create_scraper()
scraper.mount('https://', CustomCiphers(['AES128-SHA']))
This is my latest report: http://dpaste.com/1ST48R1.txt Still only difference is in Signature algorithms. I have added your hashes.
Well that's odd. Same hash: 402014b899136c3fed09cd745dc01355 :man_shrugging:
Ah ok, that hash wasn't included in your latest report. That is indeed odd...
My sigs hash didn't change. I was expecting yours to change though... Neither of them changed?
Do you want to send a PR? It might be a 100% fix.
Working on it at the moment. It might be overkill, but how about mounting the adapter on initialization, but unmounting it, once we have the cookie?
Sounds great to me!
Alright, great. I'll send you a request for feedback once I have the pull request up. Looking forward to seeing if it solves the captcha issue for @Anorov too.
The latest version of cfscrape should not encounter captchas, unless you're using Tor or another IP that Cloudflare has blacklisted. If you're getting a captcha error, first please run
pip install -U cfscrape
and try again. If you're still getting an error, please leave a comment.Please put all captcha challenge-related issues here.
Please run the following to determine the OpenSSL version compiled with your Python binary and include the output in your comment:
(Or
python
instead ofpython3
if running on Python 2.)