Anorov / cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.
MIT License
3.38k stars 459 forks source link

Failing with TOR #47

Closed nuno-andre closed 8 years ago

nuno-andre commented 8 years ago

cloudflare-scrape works like a charm from my IP, but it fails when I try to use it through Tor.

related: https://support.cloudflare.com/hc/en-us/articles/203306930-Does-CloudFlare-block-Tor-

This is an example of the page I got:

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | CloudFlare</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=
1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" t
ype="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/sty
les/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>
<!--[if lte IE 9]><script type="text/javascript" src="/cdn-cgi/scripts/jquery.mi
n.js"></script><![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zep
to.min.js"></script><!--<![endif]-->
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>

</head>
<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-
translate="enable_cookies">Please enable cookies.</div>
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="challenge_headline">One more step</h1>
        <h2 class="cf-subheadline"><span data-translate="complete_sec_check">Ple
ase complete the security check to access</span> sports.betcoin.ag</h2>
      </div><!-- /.header -->

      <div class="cf-section cf-highlight cf-captcha-container">
        <div class="cf-wrapper">
          <div class="cf-columns two">
            <div class="cf-column">
              <div class="cf-highlight-inverse cf-form-stacked">
                <form class="challenge-form" id="challenge-form" action="/cdn-cg
i/l/chk_captcha" method="get">
  <script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-typ
e="normal"  data-ray="297302c2d5d52926" async data-sitekey="6LfOYgoTAAAAAInWDVTL
Sc8Yibqp-c9DaLimzNGM" data-stoken="urFaI2UjzL7Q4gf4a-aeCBTXc1axcVPUoLk1n_YSXYddI
_7UpGGFF-ImEq1D8kqkSkc1ihC4tL3e8oV4_mHATeYFu6mOUdUZxOBRKYy5b6w"></script>
  <div class="g-recaptcha"></div>
  <noscript id="cf-captcha-bookmark" class="cf-captcha-info">
    <div><div style="width: 302px">
      <div>
        <iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfOYgoTAAA
AAInWDVTLSc8Yibqp-c9DaLimzNGM&stoken=urFaI2UjzL7Q4gf4a-aeCBTXc1axcVPUoLk1n_YSXYd
dI_7UpGGFF-ImEq1D8kqkSkc1ihC4tL3e8oV4_mHATeYFu6mOUdUZxOBRKYy5b6w" frameborder="0
" scrolling="no" style="width: 302px; height:422px; border-style: none;"></ifram
e>
      </div>
      <div style="width: 300px; border-style: none; bottom: 12px; left: 25px; ma
rgin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1
c1c1; border-radius: 3px;">
        <textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g
-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c
1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>
        <input type="submit" value="Submit"></input>
      </div>
    </div></div>
  </noscript>
</form>

              </div>
            </div>

            <div class="cf-column">
              <div class="cf-screenshot-container">

                <span class="cf-no-screenshot"></span>

              </div>
            </div>
          </div><!-- /.columns -->
        </div>
      </div><!-- /.captcha-container -->

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="why_captcha_headline">Why do I have to complete
a CAPTCHA?</h2>

            <p data-translate="why_captcha_detail">Completing the CAPTCHA proves
 you are a human and gives you temporary access to the web property.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="resolve_captcha_headline">What can I do to preve
nt this in the future?</h2>

            <p data-translate="resolve_captcha_antivirus">If you are on a person
al connection, like at home, you can run an anti-virus scan on your device to ma
ke sure it is not infected with malware.</p>

            <p data-translate="resolve_captcha_network">If you are at an office
or shared network, you can ask the network administrator to run a scan across th
e network looking for misconfigured or infected devices.</p>
          </div>
        </div>
      </div><!-- /.section -->

      <div class="cf-error-footer cf-wrapper">
  <p>
    <span class="cf-footer-item">CloudFlare Ray ID: <strong>297302c2d5d52926</st
rong></span>
    <span class="cf-footer-separator">&bull;</span>
    <span class="cf-footer-item"><span data-translate="your_ip">Your IP</span>:
195.254.135.76</span>
    <span class="cf-footer-separator">&bull;</span>
    <span class="cf-footer-item"><span data-translate="performance_security_by">
Performance &amp; security by</span> <a data-orig-proto="https" data-orig-ref="w
ww.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" tar
get="_blank">CloudFlare</a></span>

  </p>
</div><!-- /.error-footer -->

    </div><!-- /#cf-error-details -->
  </div><!-- /#cf-wrapper -->

  <script type="text/javascript">
  window._cf_translation = {};

</script>

</body>
</html>
Anorov commented 8 years ago

As you can see, it's presenting you with a reCAPTCHA captcha. This can only be solved by actually completing the captcha.

Our README has a section on this:

Note: This only works when regular Cloudflare anti-bots is enabled (the "Checking your browser before accessing..." loading page). If there is a reCAPTCHA challenge, you're out of luck.