Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.35k stars 458 forks source link

Sudden error #350

Open s4028600 opened 4 years ago

s4028600 commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.6.8

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: c:\users\long\appdata\local\programs\python\python36\lib\site-packages
Requires: requests
Required-by:

Code snippet involved with the issue

import cfscrape

scraper = cfscrape.CloudflareScraper(delay=5)
url="https://masiro.moe"
res=scraper.get(url)

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

Traceback (most recent call last):
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 255, in solve_challenge
    javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\long\Desktop\tes.py", line 13, in <module>
    res=scraper.get(url)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "C:\Users\long\AppData\Local\Programs\Python\Python36\lib\site-packages\cfscrape\__init__.py", line 292, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

[https://masiro.moe]

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

JustMachiavelli commented 4 years ago

Cfscrape was normal yesterday, but it's not working today. The source code of cloudflare's web page has obviously changed. http://www.m45e.com/

serk7 commented 4 years ago

Same problem here.

edarbieto commented 4 years ago

Yes, yesterday everything was fine. And now i noticed this:

challenge, ms = re.search(r"setTimeout\(function\(\){\s*(var "r"s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value\s*=.+?)\r?\n"r"(?:[^{<>]*},\s*(\d{4,}))?",javascript, flags=re.S).groups()

(It's @ line 253 in _init_.py)

Then I checked the html and noticed this:

image (Look at the space between s,t,o,p, b,r,e,a,k,i,n,g, next to p, variable)

So it seemed to be a missing space in the regex.

Add an space or .? in the regex (again, it's near line 253 in _init_.py). image

This worked for me at least (after 15 minutes debugging)

lovekrissh143 commented 4 years ago

You are a hero @edarbieto

I have also gone through the HTML of that site with Cloudflare protection and it seemed changed but I never realized this was so small. Well, I'm not that good debugger. I was quite confused about this javascript challenge because it involves lots of things that are hard to understand. But you came here like a godly hand. How could you do that? I mean there could be a number of other things which can cause this problem. How the hell this small space .? regex. Haahahahahahahahhahahaha Really you are phenomenal. :)

spyderbibek commented 4 years ago

Yes, yesterday everything was fine. And now i noticed this:

challenge, ms = re.search(r"setTimeout\(function\(\){\s*(var "r"s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value\s*=.+?)\r?\n"r"(?:[^{<>]*},\s*(\d{4,}))?",javascript, flags=re.S).groups()

(It's @ line 253 in init.py)

Then I checked the html and noticed this:

image (Look at the space between s,t,o,p, b,r,e,a,k,i,n,g, next to p, variable)

So it seemed to be a missing space in the regex.

Add an space or .? in the regex (again, it's near line 253 in init.py). image

This worked for me at least (after 15 minutes debugging)

i did the changes for the new challenge but i am still having issues.

Traceback (most recent call last):
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 251, in solve_challenge   
    challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "animeultima.py", line 8, in <module>
    html_content=scraper.get(uri).content
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "C:\Users\Acer\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 290, in solve_challenge
    raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

image

edarbieto commented 4 years ago

@spyderbibek mmm... can you share the piece of your code?

spyderbibek commented 4 years ago

@spyderbibek mmm... can you share the piece of your code?

here you go

import cfscrape
from bs4 import BeautifulSoup

uri="https://www1.animeultima.to/"
scraper= cfscrape.create_scraper()
html_content=scraper.get(uri).content
print(html_content)
lovekrissh143 commented 4 years ago

@spyderbibek did you update your cfscrape module and you have node js > 10

Check these criteria

[ ✓] I've upgraded cfscrape with pip install -U cfscrape

[ ✓] I'm using Node version 10 or higher

[ ✓] The site protection I'm having issues with is from Cloudflare

[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Check Node Version : (node --version or nodejs --version)

And one more thing SpyderBibek is that when you made the changes in init.py, are the changes remained there. Make sure that!

spyderbibek commented 4 years ago

@spyderbibek did you update your cfscrape module and you have node js > 10

Check these criteria

[ ✓] I've upgraded cfscrape with pip install -U cfscrape

[ ✓] I'm using Node version 10 or higher

[ ✓] The site protection I'm having issues with is from Cloudflare

[ ✓] I'm not using Tor, a VPN, or an anonymizing proxy

Check Node Version : (node --version or nodejs --version)

And one more thing SpyderBibek is that when you made the changes in init.py, are the changes remained there. Make sure that!

yes i have checked every criteria and i am 100% positive all are fulfilled and yes the changes in init.py persists.

edarbieto commented 4 years ago

@spyderbibek Well... I got this js challengue from cloudflare image (it's a completely different challengue I think so) which is so much different than the spected one image I don't know so much about cloudflare. But I'll keep investigating

spyderbibek commented 4 years ago

well for me i am getting this challenge

image

edarbieto commented 4 years ago

@spyderbibek Yes, me too (testing your code above). But there's more challengues then. After that, you'll get this: image I think that's because previous challengues were not successfully solved, so CF sends that (above js). But trust me, in my case, I solved this as I said. That's for my page request (an university one). It's possible that your page request (an anime series one) CF is more strict. I dunno :/

zn3x commented 4 years ago

It seems like cloudflare changed their methods. I'm litteraly sending same post request from my perl script like my browser does. But I keep getting 502 and keep redirected to another challenge.

danjdewhurst commented 4 years ago

I'd guess they are doing fingerprinting or similar checks. Using Selenium with either Chrome or Firefox I was still able to bypass CloudFlare.

axil commented 4 years ago

I'd guess they are doing fingerprinting or similar checks. Using Selenium with either Chrome or Firefox I was still able to bypass CloudFlare.

I used to use phantomjs/capserjs, but they seem to be unmaintaned now.

Siebe3271 commented 4 years ago

how do you change the init.py file when running via a Linux virtual machine?

fblgit commented 4 years ago

useless. this project is always same poor maintain...

pip3 install cloudscraper

import cloudscraper as cfscrape

and that was all..