Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.35k stars 458 forks source link

Wiley journals #442

Open wanghaosjtu opened 2 years ago

wanghaosjtu commented 2 years ago

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.8.5

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: c:\miniconda3\envs\py3\lib\site-packages
Requires: requests

Code snippet involved with the issue

    url = 'https://onlinelibrary.wiley.com/doi/10.1111/jpim.12613'

    import cfscrape
    scraper = cfscrape.create_scraper()
    print(scraper.get(url))
    tokens, user_agent = cfscrape.get_tokens(url)
    cookie_value, user_agent = cfscrape.get_cookie_string(url)

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): onlinelibrary.wiley.com:443
DEBUG:urllib3.connectionpool:https://onlinelibrary.wiley.com:443 "GET /doi/10.1111/jpim.12613 HTTP/1.1" 503 None
Traceback (most recent call last):
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 251, in solve_challenge
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "c:/Users/nuc-002/workspace/calibre/seleniumbrowser.py", line 416, in <module>
  File "c:\miniconda3\envs\py3\lib\site-packages\requests\sessions.py", line 542, in get
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 129, in request
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
  File "c:\miniconda3\envs\py3\lib\site-packages\cfscrape\__init__.py", line 290, in solve_challenge
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

[From ego-systems to open innovation ecosystems: A process model of inter-firm openness]

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

wanghaosjtu commented 2 years ago

this Wiley's cloudflare seems pop out even I use selenium driver to open it. sometimes it directs to right page, without luck, stuck in that cloudflare page.