codemanki / cloudscraper

--DEPRECATED -- 🛑 🛑 Node.js library to bypass cloudflare's anti-ddos page
MIT License
600 stars 139 forks source link

Randomize default useragent #77

Closed ghost closed 5 years ago

ghost commented 5 years ago

https://github.com/Anorov/cloudflare-scrape/blob/master/cfscrape/__init__.py#L17-L27

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

A note about adding this: https://github.com/Anorov/cloudflare-scrape/commit/2c9934ad071ffff3333db6153b3083227c65b01c

codemanki commented 5 years ago

@pro-src very good idea!

ghost commented 5 years ago

@codemanki, have you started work on this or can I take it?

Revadike commented 5 years ago

To what benefit? Perhaps it would cause cloudflare to identify you as a suspicious actor (and trigger recaptcha)

ghost commented 5 years ago

@Revadike, are you assuming this is done on a per request basis? As this package grows in popularity, using the same user agent for every user could potentially cause cloudflare to identify our users as suspicious actors just because they're using the user agent of this package. It's not uncommon to have a few different devices from the same IP address. If anybody wanted to use a specific user agent string, they could always set the header themselves. Simply said, there aren't any drawbacks of adding this besides reminding user to keep the user agent if they intend to persist cookies for later reuse.

codemanki commented 5 years ago

Could we maybe put this feature as off by default? I imagine that cloudflare receives so many requests per second, that they won't be blocking the UA that is by default right now. And sending multiple UA from the same IP might potentially cause some troubles, especially if this behaviour is not explicitly described in the readme which no one would read. I sort of agree with @Revadike that it might be dangerous to rotate UA at each session

ghost commented 5 years ago

I definitely agree with holding off on it until we get v3.0.0 sorted. This is a library. If you don't want to read the docs then you shouldn't do a major version upgrade. It's the very reason why you perform a major version bump in the first place. I don't have any sympathy for people who neglect to read the docs in this particular case.

About whether or not this would cause problems. This is a fork of Anorov/cloudflare-scrape which is more popular. They introduced this feature 3 years ago in this commit and there has been zero issues regarding this feature. What potential problems are there? I can't think of any. Whether on or off by default. I'm absolutely for this and can't think of a single reason not to be.

ghost commented 5 years ago

@codemanki, so that a +1 for off by default but thats only because I don't want to delay v3.0.0.

ghost commented 5 years ago

@Anorov, do you have any thoughts on this?

ghost commented 5 years ago

@codemanki, the UA blocking is a configuration option for Cloudflare site owners. It's only done on a per-site basis by the site owner. If Cloudflare detects an actual DDOS attack, things might be different. That said, if I happen to suspect that people are using cloudscraper on my cloudflare enabled site (I own one) without my permission, I'd personally configure Cloudflare to block the user agent. Now cloudscraper is completely unusable on my site but my legitimate users (a minority of them) can't access the site either. So rather than blocking them completely, I set the captcha option on by default for that particular agent. However, I won't bother trying to block cloudflare-scrape as I'd have to block more than a few popular user agents. Wait a second... I can use Cloudflare's API to send those UA's captchas whenever my server load is unbalanced. It's highly unlikely that I would ever do any of that because all I really want is the fast CDN + DDOS protection. Then again, maybe I'm cheap and need to cut down on server bills. Who knows... The point is that this isn't going to be a real problem for 99% of the people that are going to use cloudscraper and if user agent related blocking is enabled, it's going to be worse if tied to a single UA.

curl -X POST "https://api.cloudflare.com/client/v4/zones/023e105f4ecef8ad9ca31a8372d0c353/firewall/ua_rules" \
     -H "X-Auth-Email: user@example.com" \
     -H "X-Auth-Key: c2547eb745079dac9320b638f5e225cf483cc5cfdda41" \
     -H "Content-Type: application/json" \
     --data '{"id":"372e67954025e0ba6aaa6d586b9e0b59","paused":false,"description":"Prevent access from abusive clients identified by this UserAgent to mitigate DDoS attack","mode":"js_challenge","configuration":{"target":"ua","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4"}}'

:point_up: Replace mode js_challenge with challenge for captcha.

Revadike commented 5 years ago

https://github.com/codemanki/cloudscraper/issues/62

ghost commented 5 years ago

This comment by @Anorov is relevant. Who seems to have unsubscribed... Maybe he is jelly? :laughing: