Getting 403 on all requests. CF might have pushed an update.

azerpas commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

[x] I've upgraded cfscrape with pip install -U cfscrape
[x] I'm using Node version 10 or higher
[x] The site protection I'm having issues with is from Cloudflare
[x] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 3.7.5

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/local/lib/python3.7/site-packages
Requires: requests
Required-by:

Code snippet involved with the issue

>>> import cfscrape
>>> scraper = cfscrape.CloudflareScraper()
>>> r = scraper.get("https://www.nakedcph.com/")
<Response [403]>

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

URL of the Cloudflare-protected page

https://www.nakedcph.com

URL of Pastebin/Gist with HTML source of protected page

https://hastebin.com/iwedudaheh.xml

Getting error 403 on almost every cf sites.

bakugo commented 4 years ago

Pretty sure cfscrape cannot bypass forced captchas.

azerpas commented 4 years ago

Pretty sure cfscrape cannot bypass forced captchas.

It was... a few days ago.

bakugo commented 4 years ago

Probably because the website wasn't forcing captchas for every request 3 days ago.

Right now you will get a captcha even when using a normal browser.

azerpas commented 4 years ago

Probably because the website wasn't forcing captchas for every request 3 days ago.

Right now you will get a captcha even when using a normal browser.

The website was always forcing captcha. Even on browser.

Try with this another one with no forced captcha: https://caliroots.com/

Still 403

Sarfroz commented 4 years ago

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

azerpas commented 4 years ago

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)

Sarfroz commented 4 years ago

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)

Not working !

proxies = { 'https' : 'https://jm:ma52@192.80.10.01:3883' } 
cookie_value, user_agent = cfscrape.get_cookie_string("https://www.asaad.com/",proxies=proxies)

but it not works it take orignal IP but not the proxy ip

wsch-wa commented 4 years ago

Same issue for "https://www.curseforge.com"

    URL="https://www.curseforge.com"
    tokens, user_agent = scrapper.get_tokens(URL)
    print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))

Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()

This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information on what to fix.

Hope that helps.

Sarfroz commented 4 years ago

i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored.

On Thu, 2 Apr 2020 at 21:48, wsch-wa notifications@github.com wrote:

Same issue for "https://www.curseforge.com"
URL="https://www.curseforge.com"
tokens, user_agent = scrapper.get_tokens(URL)
print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))
Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()

This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information.

Hope that helps.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Anorov/cloudflare-scrape/issues/338#issuecomment-607946076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC66IJTMXH473HYTYOFGI3LRKS3FRANCNFSM4LDZ2HOQ .

wsch-wa commented 4 years ago

i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored. …

My use case to be super clear on the issue:

Browser (FF, IE,Chrome) shows the site without Captcha
I am not using a proxy
cfscrape returns a 403 error code which seems to be not representing the reality. The body-text shows "Normal content".
Using Browsers I receive status 200 using F12 Debugging the traffic.
For me the headers of Browser and cfscrape look similar it is just the status 200 vs. 403

Since I have no clue what information I have to provide to fix the problem, I provide the Browsers sent request and received answer. ---- Browser Sent Request {"Anfragekopfzeilen (386 B)":{"headers":[{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8"},{"name":"Accept-Encoding","value":"gzip, deflate, br"},{"name":"Accept-Language","value":"de,en-US;q=0.7,en;q=0.3"},{"name":"Connection","value":"keep-alive"},{"name":"DNT","value":"1"},{"name":"Host","value":"www.curseforge.com"},{"name":"Upgrade-Insecure-Requests","value":"1"},{"name":"User-Agent","value":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}]}}

---- Received Answer {"Antwortkopfzeilen (1,459 KB)":{"headers":[{"name":"cache-control","value":"no-cache"},{"name":"cf-cache-status","value":"DYNAMIC"},{"name":"cf-ray","value":"57e1b846faa6cba0-VIE"},{"name":"content-encoding","value":"gzip"},{"name":"content-type","value":"text/html; charset=utf-8"},{"name":"date","value":"Fri, 03 Apr 2020 09:21:31 GMT"},{"name":"expect-ct","value":"max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""},{"name":"expires","value":"-1"},{"name":"pragma","value":"no-cache"},{"name":"server","value":"cloudflare"},{"name":"set-cookie","value":"__cfduid=d0ba11b1bc5d1c04021503cd305a7ee481585905690; expires=Sun, 03-May-20 09:21:30 GMT; path=/; domain=.curseforge.com; HttpOnly; SameSite=Lax"},{"name":"set-cookie","value":"AWSALB=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/"},{"name":"set-cookie","value":"AWSALBCORS=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/; SameSite=None; Secure"},{"name":"set-cookie","value":"Unique_ID_v2=0dcb9c0455fc4d7589e75c024408615e; domain=.curseforge.com; expires=Wed, 03-Apr-2030 09:21:31 GMT; path=/"},{"name":"set-cookie","value":"__cf_bm=3cd18abfb0a930880bac2cc3a829276760cb4fba-1585905691-1800-ASOZlhSsbwiJ+ImNKM4F5d1gy9QkDueAfXcagsYDKar7m817Ju2aCCXZOKdVAISFWbyo4XQJshOFSWWsyGT2bFg=; path=/; expires=Fri, 03-Apr-20 09:51:31 GMT; domain=.curseforge.com; HttpOnly; Secure; SameSite=None"},{"name":"strict-transport-security","value":"max-age=15768000"},{"name":"x-aspnet-version","value":"4.0.30319"},{"name":"x-aspnetmvc-version","value":"5.2"},{"name":"X-Firefox-Spdy","value":"h2"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-mvc-supplant-cachable","value":"true"},{"name":"x-ua-compatible","value":"IE=edge,chrome=1"}]}}

KebabLord commented 3 years ago

bump.

restyler commented 2 years ago

My use case to be super clear on the issue:

Browser (FF, IE,Chrome) shows the site without Captcha

I am not using a proxy

cfscrape returns a 403 error code which seems to be not representing the reality.

I think this is probably not an "under attack" cloudflare protection but a tls fingerprint protection then. https://pixeljets.com/blog/scrape-ninja-bypassing-cloudflare-403-code-1020-errors/ try this solution to confirm..

Anorov / cloudflare-scrape