Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.35k stars 458 forks source link

Getting 403 on all requests. CF might have pushed an update. #338

Open azerpas opened 4 years ago

azerpas commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.7.5

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/local/lib/python3.7/site-packages
Requires: requests
Required-by: 

Code snippet involved with the issue

>>> import cfscrape
>>> scraper = cfscrape.CloudflareScraper()
>>> r = scraper.get("https://www.nakedcph.com/")
<Response [403]>

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

URL of the Cloudflare-protected page

https://www.nakedcph.com

URL of Pastebin/Gist with HTML source of protected page

https://hastebin.com/iwedudaheh.xml

Getting error 403 on almost every cf sites.

bakugo commented 4 years ago

Pretty sure cfscrape cannot bypass forced captchas.

azerpas commented 4 years ago

Pretty sure cfscrape cannot bypass forced captchas.

It was... a few days ago.

bakugo commented 4 years ago

Probably because the website wasn't forcing captchas for every request 3 days ago.

Right now you will get a captcha even when using a normal browser.

azerpas commented 4 years ago

Probably because the website wasn't forcing captchas for every request 3 days ago.

Right now you will get a captcha even when using a normal browser.

The website was always forcing captcha. Even on browser.

Try with this another one with no forced captcha: https://caliroots.com/

Still 403

Sarfroz commented 4 years ago

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

azerpas commented 4 years ago

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)

Sarfroz commented 4 years ago

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)

Not working !

proxies = { 'https' : 'https://jm:ma52@192.80.10.01:3883' } 
cookie_value, user_agent = cfscrape.get_cookie_string("https://www.asaad.com/",proxies=proxies)

but it not works it take orignal IP but not the proxy ip

wsch-wa commented 4 years ago

Same issue for "https://www.curseforge.com"

    URL="https://www.curseforge.com"
    tokens, user_agent = scrapper.get_tokens(URL)
    print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))

Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()

This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information on what to fix.

Hope that helps.

Sarfroz commented 4 years ago

i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored.

On Thu, 2 Apr 2020 at 21:48, wsch-wa notifications@github.com wrote:

Same issue for "https://www.curseforge.com"

URL="https://www.curseforge.com"
tokens, user_agent = scrapper.get_tokens(URL)
print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))

Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()

This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information.

Hope that helps.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Anorov/cloudflare-scrape/issues/338#issuecomment-607946076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC66IJTMXH473HYTYOFGI3LRKS3FRANCNFSM4LDZ2HOQ .

wsch-wa commented 4 years ago

i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored.

My use case to be super clear on the issue:

Since I have no clue what information I have to provide to fix the problem, I provide the Browsers sent request and received answer. ---- Browser Sent Request {"Anfragekopfzeilen (386 B)":{"headers":[{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8"},{"name":"Accept-Encoding","value":"gzip, deflate, br"},{"name":"Accept-Language","value":"de,en-US;q=0.7,en;q=0.3"},{"name":"Connection","value":"keep-alive"},{"name":"DNT","value":"1"},{"name":"Host","value":"www.curseforge.com"},{"name":"Upgrade-Insecure-Requests","value":"1"},{"name":"User-Agent","value":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}]}}

---- Received Answer {"Antwortkopfzeilen (1,459 KB)":{"headers":[{"name":"cache-control","value":"no-cache"},{"name":"cf-cache-status","value":"DYNAMIC"},{"name":"cf-ray","value":"57e1b846faa6cba0-VIE"},{"name":"content-encoding","value":"gzip"},{"name":"content-type","value":"text/html; charset=utf-8"},{"name":"date","value":"Fri, 03 Apr 2020 09:21:31 GMT"},{"name":"expect-ct","value":"max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""},{"name":"expires","value":"-1"},{"name":"pragma","value":"no-cache"},{"name":"server","value":"cloudflare"},{"name":"set-cookie","value":"__cfduid=d0ba11b1bc5d1c04021503cd305a7ee481585905690; expires=Sun, 03-May-20 09:21:30 GMT; path=/; domain=.curseforge.com; HttpOnly; SameSite=Lax"},{"name":"set-cookie","value":"AWSALB=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/"},{"name":"set-cookie","value":"AWSALBCORS=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/; SameSite=None; Secure"},{"name":"set-cookie","value":"Unique_ID_v2=0dcb9c0455fc4d7589e75c024408615e; domain=.curseforge.com; expires=Wed, 03-Apr-2030 09:21:31 GMT; path=/"},{"name":"set-cookie","value":"__cf_bm=3cd18abfb0a930880bac2cc3a829276760cb4fba-1585905691-1800-ASOZlhSsbwiJ+ImNKM4F5d1gy9QkDueAfXcagsYDKar7m817Ju2aCCXZOKdVAISFWbyo4XQJshOFSWWsyGT2bFg=; path=/; expires=Fri, 03-Apr-20 09:51:31 GMT; domain=.curseforge.com; HttpOnly; Secure; SameSite=None"},{"name":"strict-transport-security","value":"max-age=15768000"},{"name":"x-aspnet-version","value":"4.0.30319"},{"name":"x-aspnetmvc-version","value":"5.2"},{"name":"X-Firefox-Spdy","value":"h2"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-mvc-supplant-cachable","value":"true"},{"name":"x-ua-compatible","value":"IE=edge,chrome=1"}]}}

KebabLord commented 3 years ago

bump.

restyler commented 2 years ago

My use case to be super clear on the issue:

  • Browser (FF, IE,Chrome) shows the site without Captcha
  • I am not using a proxy
  • cfscrape returns a 403 error code which seems to be not representing the reality.

I think this is probably not an "under attack" cloudflare protection but a tls fingerprint protection then. https://pixeljets.com/blog/scrape-ninja-bypassing-cloudflare-403-code-1020-errors/ try this solution to confirm..