Closed ghost closed 5 years ago
@pro-src very good idea!
@codemanki, have you started work on this or can I take it?
To what benefit? Perhaps it would cause cloudflare to identify you as a suspicious actor (and trigger recaptcha)
@Revadike, are you assuming this is done on a per request basis? As this package grows in popularity, using the same user agent for every user could potentially cause cloudflare to identify our users as suspicious actors just because they're using the user agent of this package. It's not uncommon to have a few different devices from the same IP address. If anybody wanted to use a specific user agent string, they could always set the header themselves. Simply said, there aren't any drawbacks of adding this besides reminding user to keep the user agent if they intend to persist cookies for later reuse.
Could we maybe put this feature as off by default? I imagine that cloudflare receives so many requests per second, that they won't be blocking the UA that is by default right now. And sending multiple UA from the same IP might potentially cause some troubles, especially if this behaviour is not explicitly described in the readme which no one would read. I sort of agree with @Revadike that it might be dangerous to rotate UA at each session
I definitely agree with holding off on it until we get v3.0.0 sorted. This is a library. If you don't want to read the docs then you shouldn't do a major version upgrade. It's the very reason why you perform a major version bump in the first place. I don't have any sympathy for people who neglect to read the docs in this particular case.
About whether or not this would cause problems. This is a fork of Anorov/cloudflare-scrape which is more popular. They introduced this feature 3 years ago in this commit and there has been zero issues regarding this feature. What potential problems are there? I can't think of any. Whether on or off by default. I'm absolutely for this and can't think of a single reason not to be.
@codemanki, so that a +1 for off by default but thats only because I don't want to delay v3.0.0.
@Anorov, do you have any thoughts on this?
@codemanki, the UA blocking is a configuration option for Cloudflare site owners. It's only done on a per-site basis by the site owner. If Cloudflare detects an actual DDOS attack, things might be different. That said, if I happen to suspect that people are using cloudscraper on my cloudflare enabled site (I own one) without my permission, I'd personally configure Cloudflare to block the user agent. Now cloudscraper is completely unusable on my site but my legitimate users (a minority of them) can't access the site either. So rather than blocking them completely, I set the captcha option on by default for that particular agent. However, I won't bother trying to block cloudflare-scrape as I'd have to block more than a few popular user agents. Wait a second... I can use Cloudflare's API to send those UA's captchas whenever my server load is unbalanced. It's highly unlikely that I would ever do any of that because all I really want is the fast CDN + DDOS protection. Then again, maybe I'm cheap and need to cut down on server bills. Who knows... The point is that this isn't going to be a real problem for 99% of the people that are going to use cloudscraper and if user agent related blocking is enabled, it's going to be worse if tied to a single UA.
curl -X POST "https://api.cloudflare.com/client/v4/zones/023e105f4ecef8ad9ca31a8372d0c353/firewall/ua_rules" \
-H "X-Auth-Email: user@example.com" \
-H "X-Auth-Key: c2547eb745079dac9320b638f5e225cf483cc5cfdda41" \
-H "Content-Type: application/json" \
--data '{"id":"372e67954025e0ba6aaa6d586b9e0b59","paused":false,"description":"Prevent access from abusive clients identified by this UserAgent to mitigate DDoS attack","mode":"js_challenge","configuration":{"target":"ua","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4"}}'
:point_up: Replace mode js_challenge
with challenge
for captcha.
https://github.com/Anorov/cloudflare-scrape/blob/master/cfscrape/__init__.py#L17-L27
A note about adding this: https://github.com/Anorov/cloudflare-scrape/commit/2c9934ad071ffff3333db6153b3083227c65b01c