Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.38k stars 459 forks source link

Doesen't work, it keeps running forever. #311

Closed makovez closed 4 years ago

makovez commented 4 years ago

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

Python version number

Run python --version and paste the output below:

Python 3.7.5

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.0.8
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: /usr/local/lib/python3.7/site-packages
Requires: requests
Required-by:

Code snippet involved with the issue

>>> import cfscrape
>>> scraper = cfscrape.create_scraper()
>>> res = scraper.get("https://altadefinizione01-nuovo.link")
**... for ever, doesen't return a result**

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

URL of the Cloudflare-protected page

https://altadefinizione01-nuovo.link

URL of Pastebin/Gist with HTML source of protected page

https://pastebin.com/mkrSkaMi

Dangeres commented 4 years ago

Have same error but with site https://www.ggdrop.net/ ;((

AbdullahM0hamed commented 4 years ago

It's happening on all cloudflare sites as far as I can tell. Also, the first link: https://altadefinizione01-nuovo.link, seems to work fine, even with requests. I don't think it's using cloudflare anymore.

makovez commented 4 years ago

It's happening on all cloudflare sites as far as I can tell. Also, the first link: https://altadefinizione01-nuovo.link, seems to work fine, even with requests. I don't think it's using cloudflare anymore.

I dont know what's going on. But i swear, before it wasn't working. Now it seems the site is not even using the cloudflare protection anymore wtf...

I was using this library until today and it worked fine. Then, today wasn't working anymore. And now the site removed cloudflare... ok🤣

Dangeres commented 4 years ago

It's happening on all cloudflare sites as far as I can tell. Also, the first link: https://altadefinizione01-nuovo.link, seems to work fine, even with requests. I don't think it's using cloudflare anymore.

I dont know what's going on. But i swear, before it wasn't working. Now it seems the site is not even using the cloudflare protection anymore wtf...

I was using this library until today and it worked fine. Then, today wasn't working anymore. And now the site removed cloudflare... ok🤣

Dont close that issues please, just my site is not working now and i have same situation. Can you check my site with cloudflare on your build, please?

makovez commented 4 years ago

It's happening on all cloudflare sites as far as I can tell. Also, the first link: https://altadefinizione01-nuovo.link, seems to work fine, even with requests. I don't think it's using cloudflare anymore.

I dont know what's going on. But i swear, before it wasn't working. Now it seems the site is not even using the cloudflare protection anymore wtf... I was using this library until today and it worked fine. Then, today wasn't working anymore. And now the site removed cloudflare... okrofl

Dont close that issues please, just my site is not working now and i have same situation. Can you check my site with cloudflare on your build, please?

It gives me 503 as well

Dangeres commented 4 years ago

It's happening on all cloudflare sites as far as I can tell. Also, the first link: https://altadefinizione01-nuovo.link, seems to work fine, even with requests. I don't think it's using cloudflare anymore.

I dont know what's going on. But i swear, before it wasn't working. Now it seems the site is not even using the cloudflare protection anymore wtf... I was using this library until today and it worked fine. Then, today wasn't working anymore. And now the site removed cloudflare... okrofl

Dont close that issues please, just my site is not working now and i have same situation. Can you check my site with cloudflare on your build, please?

It gives me 503 as well

Does it return something to you?

ghost commented 4 years ago

it is specifically failing to find the name="s" it does not seem to exist in the cloudflare page source anymore.....

re.findall(r'name="(s|jschl_vc|pass)"(?: [^<>]*)? value="(.+?)"', body)

makovez commented 4 years ago

"CF have changed their endpoints for the challenge solves..." from another guy in another cloudflare bypass repo.

ghost commented 4 years ago

"CF have changed their endpoints for the challenge solves..." from another guy in another cloudflare bypass repo.

i just read the same sort of thing... shame cause my projects depend on the bypassing ffs...

AbdullahM0hamed commented 4 years ago

it is specifically failing to find the name="s" it does not seem to exist in the cloudflare page source anymore.....

re.findall(r'name="(s|jschl_vc|pass)"(?: [^<>]*)? value="(.+?)"', body)

not really sure where this s was supposed to be, but figured I'd replace it with a . in the regex, and it no longer hangs, but I get this instead:


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/cfscrape/__init__.py", line 128, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/cfscrape/__init__.py", line 194, in solve_cf_challenge
    redirect = self.request(method, submit_url, **cloudflare_kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/cfscrape/__init__.py", line 128, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/cfscrape/__init__.py", line 194, in solve_cf_challenge
    redirect = self.request(method, submit_url, **cloudflare_kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/cfscrape/__init__.py", line 128, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/cfscrape/__init__.py", line 195, in solve_cf_challenge
    redirect_location = urlparse(redirect.headers["Location"])
  File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/requests/structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'location'
AbdullahM0hamed commented 4 years ago

"CF have changed their endpoints for the challenge solves..." from another guy in another cloudflare bypass repo.

I read that just now, and it seems like the dev will be fixing it soon

LXMedia1 commented 4 years ago

the redirect address changed regex = r"<form id=\"challenge-form\" action=\"(.*?)\"" redirect_addr = re.search(regex, body, re.MULTILINE)[1] submit_url = "%s://%s%s" % (parsed_url.scheme, domain, redirect_addr)

and its a post now but even then you not get the cookie -.- you end in a endlessloop from targetsite to cloudflare to targetsite ( no cf_clearance cookie ) back to cloudflare and so on :(

FarBeyondDriven commented 4 years ago

Same result here, endless loop

orcololo commented 4 years ago

Still on a loop. :(

emavgl commented 4 years ago

Could be useful following these changes? https://github.com/codemanki/cloudscraper/pull/287/files#diff-168726dbe96b3ce427e7fedce31bb0bcR353-R359

NinoSoles commented 4 years ago

Im getting the same error, it just loops and stays still.

AbdullahM0hamed commented 4 years ago

I had the same problem when I used pip to upgrade the package, but works fine after uninstalling and reinstalling it

FarBeyondDriven commented 4 years ago

I had the same problem when I used pip to upgrade the package, but works fine after uninstalling and reinstalling it

reinstalling definitely does not fix this issue

serk7 commented 4 years ago

Same problem here since 6 hours ago :P

AbdullahM0hamed commented 4 years ago

Nvm, wrong repo, thought this was cloudscape, my bad😅

BTW, that library has fixed it now, and is used similarly too this one so you should try it: https://github.com/VeNoMouS/cloudscraper

pl77 commented 4 years ago

For those of you who can't wait for a new version to get published to pypi, you can patch it yourself. There is a fix from @alzamer2 in the pull requests:

https://github.com/Anorov/cloudflare-scrape/pull/315

To find where your cfscrape module is stored, the easiest way is with the python interpreter from the environment you want to fix. Type python to open up the interpreter and you'll be greeted with a similar prompt as the one below:

Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Type the following two commands:

import importlib
importlib.util.find_spec('cfscrape').origin

...which will give you a string path location (similar to the one below):

'/home/user/anaconda/lib/python3.7/site-packages/cfscrape/__init__.py'

Open that __init__.py file in an editor and replace all the contents with the raw contents from the pull request:

raw text from pull request 115

Save it and your cfscrape should work again.

Once the pull request is pushed to pypi just update as normal and the contents will be replaced.

bsuire commented 4 years ago

Same problem here :) Is this library still supported ?

shazamlx commented 4 years ago

Same problem here :) Is this library still supported ?

Have you solved it? It doesn't work.

KurumiSerori commented 4 years ago

Have applied Fix #315 It worked some days ago (I think about 2 weeks ago?) and it seems broken now. The scraper happens to run forever again. Maybe Cloudflare just updated the challenge.

lord8266 commented 4 years ago

@KurumiSerori Same website? If not open a new issue with the bug template.

KurumiSerori commented 4 years ago

@KurumiSerori Same website? If not open a new issue with the bug template.

Not on the same site. For security reasons I cannot provide the url, but after some attempts, the problem is solved. I'm now using requests.get() with cloudflare tokens and User-Agent acquired by cfscrape.get_tokens() instead of directly using cfscrape.create_scraper.get() with pre-defined User-Agent, the later one would stuck.

Anorov commented 4 years ago

@KurumiSerori If you run pip install -U cfscrape now, do you still have the same issue?

KurumiSerori commented 4 years ago

@KurumiSerori If you run pip install -U cfscrape now, do you still have the same issue?

Glad to see the repository updated. As to me, the problem is already solved.