Open githubhan2016 opened 4 years ago
Have the same issue but on a widows machine the code runs without issues, on linux (ubuntu 20.10 and python 3.8) the issue is coming out. Most likely is an issue with web requests. I'm getting a HTTP 503 error from cloudflare. Any idea what it can be? Probably not waiting enough time and getting stuck on the cloudflare langing page?
Before creating an issue, first upgrade cfscrape with
pip install -U cfscrape
and see if you're still experiencing the problem. Please also confirm your Node version (node --version
ornodejs --version
) is version 10 or higher.Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.
Please confirm the following statements and check the boxes before creating an issue:
pip install -U cfscrape
Python version number
Run
python --version
and paste the output below: Python 3.7.0cfscrape version number
Run
pip show cfscrape
and paste the output below: Name: cfscrape Version: 2.1.1 Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information. Home-page: https://github.com/Anorov/cloudflare-scrape Author: Anorov Author-email: anorov.vorona@gmail.com License: UNKNOWN Location: c:\python3\venv3\lib\site-packages Requires: requests Required-by:Code snippet involved with the issue
import cfscrape proxies = { "http": "http://127.0.0.1:10809", "https": "https://127.0.0.1:10809", }
scraper = cfscrape.create_scraper()
web_data = scraper.get("http://www.javlibrary.com/cn/vl_genre.php?list&g=ki&mode=2&page=4", proxies=proxies).content print(web_data)
Complete exception and traceback
(If the problem doesn't involve an exception being raised, leave this blank) C:\python3\venv3\Scripts\python.exe C:/python3/venv3/javlibrary-spider-master/cfs.py Traceback (most recent call last): File "C:\python3\venv3\lib\site-packages\cfscrape__init__.py", line 174, in solve_cf_challenge cloudflare_kwargs["params"].update({param.split('=')[0]:param.split('=')[1]}) IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:/python3/venv3/javlibrary-spider-master/cfs.py", line 12, in
web_data = scraper.get("http://www.javlibrary.com/cn/vl_genre.php?list&g=ki&mode=2&page=4", proxies=proxies).content
File "C:\python3\venv3\lib\site-packages\requests\sessions.py", line 546, in get
return self.request('GET', url, kwargs)
File "C:\python3\venv3\lib\site-packages\cfscrape__init__.py", line 129, in request
resp = self.solve_cf_challenge(resp, kwargs)
File "C:\python3\venv3\lib\site-packages\cfscrape__init__.py", line 200, in solve_cf_challenge
% (e, BUG_REPORT)
ValueError: Unable to parse Cloudflare anti-bot IUAM page: list index out of range Cloudflare may have changed their technique, or there may be a bug in the script.
Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."
Process finished with exit code 1
URL of the Cloudflare-protected page
[LINK GOES HERE] http://www.javlibrary.com/cn/vl_genre.php?list&g=ki&mode=2&page=4
URL of Pastebin/Gist with HTML source of protected page
[LINK GOES HERE]