s4028600 commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

[ ✓] I've upgraded cfscrape with pip install -U cfscrape
[ ✓] I'm using Node version 10 or higher
[ ✓] The site protection I'm having issues with is from Cloudflare
[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 3.6.8

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: c:\users\long\appdata\local\programs\python\python36\lib\site-packages
Requires: requests
Required-by:

Code snippet involved with the issue

import cfscrape

scraper = cfscrape.CloudflareScraper(delay=5)
url="https://masiro.moe"
res=scraper.get(url)

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

Traceback (most recent call last):
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 255, in solve_challenge
    javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\long\Desktop\tes.py", line 13, in <module>
    res=scraper.get(url)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 292, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

[https://masiro.moe]

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

JustMachiavelli commented 4 years ago

Cfscrape was normal yesterday, but it's not working today. The source code of cloudflare's web page has obviously changed. http://www.m45e.com/

serk7 commented 4 years ago

Same problem here.

edarbieto commented 4 years ago

Yes, yesterday everything was fine. And now i noticed this:

challenge, ms = re.search(r"setTimeout\(function\(\){\s*(var "r"s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value\s*=.+?)\r?\n"r"(?:[^{<>]*},\s*(\d{4,}))?",javascript, flags=re.S).groups()

(It's @ line 253 in _init_.py)

Then I checked the html and noticed this:

(Look at the space between s,t,o,p, b,r,e,a,k,i,n,g, next to p, variable)

So it seemed to be a missing space in the regex.

Add an space or .? in the regex (again, it's near line 253 in _init_.py).

This worked for me at least (after 15 minutes debugging)

lovekrissh143 commented 4 years ago

You are a hero @edarbieto

I have also gone through the HTML of that site with Cloudflare protection and it seemed changed but I never realized this was so small. Well, I'm not that good debugger. I was quite confused about this javascript challenge because it involves lots of things that are hard to understand. But you came here like a godly hand. How could you do that? I mean there could be a number of other things which can cause this problem. How the hell this small space .? regex. Haahahahahahahahhahahaha Really you are phenomenal. :)

spyderbibek commented 4 years ago

Yes, yesterday everything was fine. And now i noticed this:

challenge, ms = re.search(r"setTimeout\(function\(\){\s*(var "r"s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value\s*=.+?)\r?\n"r"(?:[^{<>]*},\s*(\d{4,}))?",javascript, flags=re.S).groups()

(It's @ line 253 in init.py)

Then I checked the html and noticed this:

(Look at the space between s,t,o,p, b,r,e,a,k,i,n,g, next to p, variable)

So it seemed to be a missing space in the regex.

Add an space or .? in the regex (again, it's near line 253 in init.py).

This worked for me at least (after 15 minutes debugging)

i did the changes for the new challenge but i am still having issues.

Traceback (most recent call last):
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 251, in solve_challenge   
    challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "animeultima.py", line 8, in <module>
    html_content=scraper.get(uri).content
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 290, in solve_challenge
    raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

edarbieto commented 4 years ago

@spyderbibek mmm... can you share the piece of your code?

spyderbibek commented 4 years ago

@spyderbibek mmm... can you share the piece of your code?

here you go

import cfscrape
from bs4 import BeautifulSoup

uri="https://www1.animeultima.to/"
scraper= cfscrape.create_scraper()
html_content=scraper.get(uri).content
print(html_content)

lovekrissh143 commented 4 years ago

@spyderbibek did you update your cfscrape module and you have node js > 10

Check these criteria

[ ✓] I've upgraded cfscrape with pip install -U cfscrape

[ ✓] I'm using Node version 10 or higher

[ ✓] The site protection I'm having issues with is from Cloudflare

[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Check Node Version : (node --version or nodejs --version)

And one more thing SpyderBibek is that when you made the changes in init.py, are the changes remained there. Make sure that!

spyderbibek commented 4 years ago

@spyderbibek did you update your cfscrape module and you have node js > 10

Check these criteria

[ ✓] I've upgraded cfscrape with pip install -U cfscrape

[ ✓] I'm using Node version 10 or higher

[ ✓] The site protection I'm having issues with is from Cloudflare

[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Check Node Version : (node --version or nodejs --version)

And one more thing SpyderBibek is that when you made the changes in init.py, are the changes remained there. Make sure that!

yes i have checked every criteria and i am 100% positive all are fulfilled and yes the changes in init.py persists.

edarbieto commented 4 years ago

@spyderbibek Well... I got this js challengue from cloudflare (it's a completely different challengue I think so) which is so much different than the spected one I don't know so much about cloudflare. But I'll keep investigating

spyderbibek commented 4 years ago

well for me i am getting this challenge

edarbieto commented 4 years ago

@spyderbibek Yes, me too (testing your code above). But there's more challengues then. After that, you'll get this: I think that's because previous challengues were not successfully solved, so CF sends that (above js). But trust me, in my case, I solved this as I said. That's for my page request (an university one). It's possible that your page request (an anime series one) CF is more strict. I dunno :/

zn3x commented 4 years ago

It seems like cloudflare changed their methods. I'm litteraly sending same post request from my perl script like my browser does. But I keep getting 502 and keep redirected to another challenge.

danjdewhurst commented 4 years ago

I'd guess they are doing fingerprinting or similar checks. Using Selenium with either Chrome or Firefox I was still able to bypass CloudFlare.

axil commented 4 years ago

I'd guess they are doing fingerprinting or similar checks. Using Selenium with either Chrome or Firefox I was still able to bypass CloudFlare.

I used to use phantomjs/capserjs, but they seem to be unmaintaned now.

Siebe3271 commented 4 years ago

how do you change the init.py file when running via a Linux virtual machine?

fblgit commented 4 years ago

useless. this project is always same poor maintain...

pip3 install cloudscraper

import cloudscraper as cfscrape

and that was all..

Anorov / cloudflare-scrape

Sudden error #350

Python version number

cfscrape version number

Code snippet involved with the issue

Complete exception and traceback

URL of the Cloudflare-protected page

URL of Pastebin/Gist with HTML source of protected page

You are a hero @edarbieto

@spyderbibek did you update your cfscrape module and you have node js > 10

Check these criteria

[ ✓] I've upgraded cfscrape with pip install -U cfscrape

[ ✓] I'm using Node version 10 or higher

[ ✓] The site protection I'm having issues with is from Cloudflare

[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Check Node Version : (node --version or nodejs --version)

And one more thing SpyderBibek is that when you made the changes in init.py, are the changes remained there. Make sure that!

@spyderbibek did you update your cfscrape module and you have node js > 10

Check these criteria

[ ✓] I've upgraded cfscrape with pip install -U cfscrape

[ ✓] I'm using Node version 10 or higher

[ ✓] The site protection I'm having issues with is from Cloudflare

[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Check Node Version : (node --version or nodejs --version)

And one more thing SpyderBibek is that when you made the changes in init.py, are the changes remained there. Make sure that!

pip3 install cloudscraper

import cloudscraper as cfscrape