Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.35k stars 458 forks source link

cloudflare #371

Open rnrnstar2 opened 4 years ago

rnrnstar2 commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

cfscrape version number

Run pip show cfscrape and paste the output below:

Code snippet involved with the issue

2020-06-16 18:42:03 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scraping)
2020-06-16 18:42:03 [scrapy.utils.log] INFO: Versions: lxml, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.7 (default, May  6 2020, 04:59:01) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 2.9.2, Platform Darwin-19.5.0-x86_64-i386-64bit
2020-06-16 18:42:03 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'scraping', 'CONCURRENT_REQUESTS': 32, 'CONCURRENT_REQUESTS_PER_DOMAIN': 32, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 2, 'DOWNLOAD_TIMEOUT': 600, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'FEED_FORMAT': 'csv', 'FEED_URI': 'results/%(name)s_%(time)s.csv', 'HTTPCACHE_ENABLED': True, 'HTTPCACHE_EXPIRATION_SECS': 43200, 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage', 'NEWSPIDER_MODULE': 'scraping.spiders', 'SPIDER_MODULES': ['scraping.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}
2020-06-16 18:42:03 [scrapy.extensions.telnet] INFO: Telnet Password: e179fe629b29425b
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled extensions:
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled downloader middlewares:
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled spider middlewares:
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled item pipelines:
2020-06-16 18:42:03 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1):
2020-06-16 18:42:04 [urllib3.connectionpool] DEBUG: "GET /jp/shopping/woman HTTP/1.1" 503 None
Unhandled error in Deferred:
2020-06-16 18:42:04 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/", line 172, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/", line 176, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/", line 81, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/Users/rnrnstar/github/Spiders/scraping/spiders/", line 41, in start_requests
    data = scraper.get("").content
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/requests/", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 207, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 299, in solve_challenge
builtins.ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read, then file a bug report at"

2020-06-16 18:42:04 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 259, in solve_challenge
    javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/", line 81, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/Users/rnrnstar/github/Spiders/scraping/spiders/", line 41, in start_requests
    data = scraper.get("").content
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/requests/", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 207, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/", line 299, in solve_challenge
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read, then file a bug report at"

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

URL of the Cloudflare-protected page


URL of Pastebin/Gist with HTML source of protected page


Sraq-Zit commented 4 years ago

Try this #373 I tested it with your link and it worked