githubhan2016 commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

[ ] I've upgraded cfscrape with pip install -U cfscrape
[ ] I'm using Node version 10 or higher
[ ] The site protection I'm having issues with is from Cloudflare
[ ] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below: Python 3.7.0

cfscrape version number

Run pip show cfscrape and paste the output below: Name: cfscrape Version: 2.1.1 Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information. Home-page: https://github.com/Anorov/cloudflare-scrape Author: Anorov Author-email: anorov.vorona@gmail.com License: UNKNOWN Location: c:\python3\venv3\lib\site-packages Requires: requests Required-by:

Code snippet involved with the issue

import cfscrape proxies = { "http": "http://127.0.0.1:10809", "https": "https://127.0.0.1:10809", }

scraper = cfscrape.create_scraper()

web_data = scraper.get("http://www.javlibrary.com/cn/vl_genre.php?list&g=ki&mode=2&page=4", proxies=proxies).content print(web_data)

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank) C:\python3\venv3\Scripts\python.exe C:/python3/venv3/javlibrary-spider-master/cfs.py Traceback (most recent call last): File "C:\python3\venv3\lib\site-packages\cfscrape__init__.py", line 174, in solve_cf_challenge cloudflare_kwargs["params"].update({param.split('=')[0]:param.split('=')[1]}) IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:/python3/venv3/javlibrary-spider-master/cfs.py", line 12, in web_data = scraper.get("http://www.javlibrary.com/cn/vl_genre.php?list&g=ki&mode=2&page=4", proxies=proxies).content File "C:\python3\venv3\lib\site-packages\requests\sessions.py", line 546, in get return self.request('GET', url, kwargs) File "C:\python3\venv3\lib\site-packages\cfscrape__init__.py", line 129, in request resp = self.solve_cf_challenge(resp, kwargs) File "C:\python3\venv3\lib\site-packages\cfscrape__init__.py", line 200, in solve_cf_challenge % (e, BUG_REPORT) ValueError: Unable to parse Cloudflare anti-bot IUAM page: list index out of range Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

Process finished with exit code 1

URL of the Cloudflare-protected page

[LINK GOES HERE] http://www.javlibrary.com/cn/vl_genre.php?list&g=ki&mode=2&page=4

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

lovekrissh143 commented 4 years ago

I'm also having the same problem as you are having

I'm also trying to scrape an anime site protected with cloudflare

@Anorov Please Help!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

vgrimaldi848 commented 3 years ago

Have the same issue but on a widows machine the code runs without issues, on linux (ubuntu 20.10 and python 3.8) the issue is coming out. Most likely is an issue with web requests. I'm getting a HTTP 503 error from cloudflare. Any idea what it can be? Probably not waiting enough time and getting stuck on the cloudflare langing page?

Anorov / cloudflare-scrape

ValueError: Unable to parse Cloudflare anti-bot IUAM page: list index out of range Cloudflare may have changed their technique, or there may be a bug in the script. #348