Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.35k stars 458 forks source link

issue found #363

Open alienkidmj12 opened 4 years ago

alienkidmj12 commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

root@balder:~# python --version
Python 2.7.16
root@balder:~#

cfscrape version number

Run pip show cfscrape and paste the output below:

root@balder:~# pip show cfscrape
Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/local/lib/python2.7/dist-packages
Requires: requests
Required-by:
root@balder:~#

Code snippet involved with the issue

#!/usr/bin/env python

import csv
import os
import sys

import cfscrape

scraper = cfscrape.create_scraper()

filename = 'psn.csv'
with open(filename, 'rb') as f:
    reader = csv.reader(f)
    try:
        for row in reader:
            if 'http' in row[0]:
                reverse = row[0][::-1]
                i = reverse.index('/')
                tmp = reverse[0:i]
                cfurl = scraper.get(row[0]).content
                if not os.path.exists("./"+tmp[::-1]):
                    with open(tmp[::-1], 'wb') as f:
                        f.write(cfurl)
                        f.close()
                else:
                    print("file: ", tmp[::-1], "already exists")
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

Complete exception and traceback

root@balder:~# ./grab.py Traceback (most recent call last): File "./grab.py", line 20, in cfurl = scraper.get(row[0]).content File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 543, in get return self.request('GET', url, kwargs) File "/usr/local/lib/python2.7/dist-packages/cfscrape/init.py", line 129, in request resp = self.solve_cf_challenge(resp, kwargs) File "/usr/local/lib/python2.7/dist-packages/cfscrape/init.py", line 204, in solve_cf_challenge answer, delay = self.solve_challenge(body, domain) File "/usr/local/lib/python2.7/dist-packages/cfscrape/init.py", line 292, in solve_challenge % BUG_REPORT ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

(If the problem doesn't involve an exception being raised, leave this blank)

URL of the Cloudflare-protected page

cant provide if needed later

URL of Pastebin/Gist with HTML source of protected page

no idea what this is

chibani commented 4 years ago

same here (and it's my first use of the package)