DataUSA / datausa-site

The most comprehensive visualization of U.S. public data —
www.datausa.io
GNU Affero General Public License v3.0
97 stars 28 forks source link

API access blocked by CloudFlare #890

Closed kevinrobinson closed 4 years ago

kevinrobinson commented 4 years ago

Hello, thanks for sharing this awesome work!

I was trying to explore some of this data, and made a Colab notebook to do this, which makes an HTTP request to grab data from a URL on datausa.io. Unfortunately these came back as 403 and CloudFlare sends a challenge. I pasted the full HTML response below, but the substance is:

The specific URL I was trying was https://banana.datausa.io/api/data?measure=Commute%20Means,Commute%20Means%20Moe&geo=16000US1901855&drilldowns=Group&year=2017, which works fine in a browser, or over curl.

I was figuring CloudFlare disliked requests from the Colab backend, and was wondering if there was a different way to access data through the HTTP API? But fiddling around I also discovered that adding a User-Agent header to the HTTP request satisfied CloudFlare, even if it was just an empty string. 🤷

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>

<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->

</head>
<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="challenge_headline">One more step</h1>
        <h2 class="cf-subheadline"><span data-translate="complete_sec_check">Please complete the security check to access</span> banana.datausa.io</h2>
      </div><!-- /.header -->

      <div class="cf-section cf-highlight cf-captcha-container">
        <div class="cf-wrapper">
          <div class="cf-columns two">
            <div class="cf-column">

              <div class="cf-highlight-inverse cf-form-stacked">
                <form class="challenge-form" id="challenge-form" action="/api/data?measure=Commute%20Means,Commute%20Means%20Moe&amp;geo=16000US1901855&amp;drilldowns=Group&amp;year=2017&amp;__cf_chl_captcha_tk__=c4b9070e357885d84c15d1d9ad05d21004890dce-1588765984-0-AWAq8JMKDc3kBVHkhiET5lbPRq8kT1whug6OQ1HXMqrCYx_l8c-arbTxiJhdeetPctwY-rxBLWpZLBMHVtLPJ_XaM8lY7InRjhluQAhWzgkIHluask9ow9nAfq4IkJIDyXYKDoaSy1Z-gVCb94Ysw3TZedHj5r9IhVbb_OhepIGgx-pTgNGmCXqm1jKJuoAsY8bAtX7MweYNUkjTepIE9jw_2wVNVaPPNJ64N3L_5rR0ctW2hN7cwZ0APBXZVDYm96qtYn8GTzoOaBbZZd_55ova3zgxmmrrlqgxsCFO5DTqxvc-Aj-tZovfaX2TehV2-z0Y1YvzIgECOruq-m_FpDssNaNxnTOXKCwpnpjnLIWdFmWqhL6De5xWieyJZDLgYz1ruqC0E-8ghDNimWXJp1Ivs-TYxnbJ9u9y31XBoPphJYpnUfTHqI708VAERkUoxDgn7zZ5uvOsE11Yf_KZcNwyQhTiiMofTvUnb4-7Vo_jxzDMf5x4XDUhUhuIw4jzL3mcKezW_SzlOaQxBs-MB0yHnsBp1QVjkn4KZ3oNsT0OrLrf6wFEI4cyUIYNJAYLVL9p97WdnhcS9PcLyj1ntZF_7OOnOiPvGNRCHVUQ8rYrTQ52qCuVdCFkUn40C5yIiHAGxzm_WwyF6i0c6kUalnFGtTFpzBbjsWMg2l1mSPMhpawArVQOeHPANfXkqUvykQ" method="POST" enctype="application/x-www-form-urlencoded">
  <input type="hidden" name="r" value="81d184cd13ca579a0543b6418f6b9e46f9205f8c-1588765984-0-AQuGun84OQDByPvKAbd9aKWQTzTuugn1O/gb6GRfn907vadRpicx6caGtUgtaxqKc4tlTHM8tOQPu04oZw4E3ZRkgiSPnAFKEHo1S7Fk8x1v7NdfY4vhrHFvsRuikYDd/TVhJn5QIcShAHi/VwnwgGLNF/mQutRgSa5kVXa2Ww35LFNPHqaOp4fjmXxXYeJbvzG5mjVge/L6wkmcMK6RbRmKo2Zsm25rMb7OpJAnUgIY06PuPIlO4e98gHjPUKBXL2Y1UV/z7EsFJGw6Gy+GyNac4Y/WOwNpD07WE7dJXPWMlBZRjcGj7FxC3Uuh8zcwtzD3TmEwc1QjSyteqMbebgBQ8VodUV4BOC91vo+WyZpGo3aL6t3ho8vYWvpaRtvnOhBmnxeJWhWnU++9162Hr8lAR2Sa48G+QtG8X8q0/IWvvsP4Q+pvyswZtf3PfBkJ6psZc80rNKL4YxNaB6Opd9C+6Pn/nEHHkLXAVVkaM74QaJGzthj0MD+IrobGevv52BIizo9MNgT/ZQk9oOKrOKch+NH+h+1qma95XRQuGv20M/dcdVa2SWA5UioRKbBZKSgpMFl20NoNbx35Byc12VJmWx+WkHr8NYv+UXYZY0VUPuoZ0rR2M1ZObh8e3kauGFc0O92MaBkAlNTLw2OIPP9rnGxD+mdn7GRadtPeXKOtZAOBKbO2fWKqDCQcYA075EQGVYSmmV6CMBs87GYBw179wJH2oJM3wqzefThpf7VxWb6gVgBfTD+x/4phKaNTnadr5hZpi4PddBZqpI5/s9V4Dy/+aEI3kkq63EwTTQuvDvae1L0YBiN7yHmuFMs0HxeD+Fy0OIET99+c03MZmyjj958UzE396sRJGsR6pFLqebzVrQjd/MFTqaXCClkvRpHd+vyav5rCrRAkkUo7nkVN+AD2AlnojQ5cbSfFNbe9TuZGlF+GwwlREk1VHTeAzFhnq5ngunpyFXx6nTB4t+ndsK0tVRIZE1Z42aLWckhoYqGMl7CAwkDz/383+5tFjvxWTBM6OVzgvG75fcn/wfY5A8Xx+Uh90jam/A40muv9WYkeNAa6BVen2T1ID4cOwzNix9AdRzBhPMqk+r6VlLO0g2COB8ERNEMo+vLo5ZmPsZPolZJNIzdVxYo5+fw63sNkn4RLbuNMYcH08JyiF5b194a8iWpRi66VxT6l2OE3mSdnW6yuwrJ3gnO22x1zX06zTWq+X93Kb8+c1Zouu0GHytYCOYVgPonJ3sL2thDd2tpeuZWDUNbuncpaNTYz75o4TIGFpaRyXIPyT3DU2uwrJHk8vjs4ds5qhcy3hLFHgdfV5Re6M++opQLR94Pjrjp19e8yY41sIxukTEMzhjvk+8PDS4qaEdWR1IyEDJ7m">
  <input type="hidden" name="cf_captcha_kind" value="h">
  <script type="text/javascript" src="/cdn-cgi/scripts/hcaptcha.challenge.js" data-type="normal"  data-ray="58f27fabb99ae77b" async data-sitekey="45fbc4de-366c-40ef-9274-9f3feca1cd6c"></script>
  <noscript id="cf-captcha-bookmark" class="cf-captcha-info">
  <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  </noscript>
  <div id="trk_captcha_js" style="background-image:url('/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=58f27fabb99ae77b')"></div>
</form>

              </div>
            </div>

            <div class="cf-column">
              <div class="cf-screenshot-container">

                <span class="cf-no-screenshot"></span>

              </div>
            </div>
          </div><!-- /.columns -->
        </div>
      </div><!-- /.captcha-container -->

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>

            <p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>

            <p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>

            <p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>

          </div>
        </div>
      </div><!-- /.section -->

      <div class="cf-error-footer cf-wrapper">
  <p>
    <span class="cf-footer-item">Cloudflare Ray ID: <strong>58f27fabb99ae77b</strong></span>
    <span class="cf-footer-separator">&bull;</span>
    <span class="cf-footer-item"><span>Your IP</span>: 35.237.143.95</span>
    <span class="cf-footer-separator">&bull;</span>
    <span class="cf-footer-item"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a></span>

  </p>
</div><!-- /.error-footer -->

    </div><!-- /#cf-error-details -->
  </div><!-- /#cf-wrapper -->

  <script type="text/javascript">
  window._cf_translation = {};

</script>

</body>
</html>
kevinrobinson commented 4 years ago

Closing since I worked around this, but sharing so y'all know about it. Thanks!