dakrone / clj-http

An idiomatic clojure http client wrapping the apache client. Officially supported version.
http://clojars.org/clj-http
MIT License
1.78k stars 411 forks source link

Compatibility issue with get request #540

Open zendevil opened 4 years ago

zendevil commented 4 years ago

When I make a request like so

(http/get "https://angel.co")

I get the response

1. Unhandled clojure.lang.ExceptionInfo
   clj-http: status 403
   {:cached nil,
    :request-time 141,
    :repeatable? false,
    :protocol-version {:name "HTTP", :major 1, :minor 1},
    :streaming? true,
    :http-client
    #object[org.apache.http.impl.client.InternalHttpClient 0x7d5d53f8 "org.apache.http.impl.client.InternalHttpClient@7d5d53f8"],
    :chunked? true,
    :type :clj-http.client/unexceptional-status,
    :reason-phrase "Forbidden",
    :headers
    {"Server" "cloudflare",
     "Content-Type" "text/html; charset=UTF-8",
     "X-Frame-Options" "SAMEORIGIN",
     "Connection" "close",
     "cf-request-id" "02afa1c33a00000cb192b11200000001",
     "Transfer-Encoding" "chunked",
     "Set-Cookie"
     "__cfduid=d5186984dc436aac8989538666ddb21761589373348; expires=Fri, 12-Jun-20 12:35:48 GMT; path=/; domain=.angel.co; HttpOnly; SameSite=Lax",
     "Expect-CT"
     "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"",
     "CF-RAY" "592c6be52bb00cb1-EWR",
     "Date" "Wed, 13 May 2020 12:35:48 GMT",
     "Vary" "Accept-Encoding",
     "Cache-Control" "no-cache",
     "CF-Chl-Bypass" "1"},
    :orig-content-encoding nil,
    :status 403,
    :length -1,
    :body
    "<!DOCTYPE html>\n<!--[if lt IE 7]> <html class=\"no-js ie6 oldie\" lang=\"en-US\"> <![endif]-->\n<!--[if IE 7]>    <html class=\"no-js ie7 oldie\" lang=\"en-US\"> <![endif]-->\n<!--[if IE 8]>    <html class=\"no-js ie8 oldie\" lang=\"en-US\"> <![endif]-->\n<!--[if gt IE 8]><!--> <html class=\"no-js\" lang=\"en-US\"> <!--<![endif]-->\n<head>\n<title>Attention Required! | Cloudflare</title>\n<meta name=\"captcha-bypass\" id=\"captcha-bypass\" />\n<meta charset=\"UTF-8\" />\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge,chrome=1\" />\n<meta name=\"robots\" content=\"noindex, nofollow\" />\n<meta name=\"viewport\" content=\"width=device-width,initial-scale=1\" />\n<link rel=\"stylesheet\" id=\"cf_styles-css\" href=\"/cdn-cgi/styles/cf.errors.css\" type=\"text/css\" media=\"screen,projection\" />\n<!--[if lt IE 9]><link rel=\"stylesheet\" id='cf_styles-ie-css' href=\"/cdn-cgi/styles/cf.errors.ie.css\" type=\"text/css\" media=\"screen,projection\" /><![endif]-->\n<style type=\"text/css\">body{margin:0;padding:0}</style>\n\n\n<!--[if gte IE 10]><!--><script type=\"text/javascript\" src=\"/cdn-cgi/scripts/zepto.min.js\"></script><!--<![endif]-->\n<!--[if gte IE 10]><!--><script type=\"text/javascript\" src=\"/cdn-cgi/scripts/cf.common.js\"></script><!--<![endif]-->\n\n\n\n\n</head>\n<body>\n  <div id=\"cf-wrapper\">\n    <div class=\"cf-alert cf-alert-error cf-cookie-error\" id=\"cookie-alert\" data-translate=\"enable_cookies\">Please enable cookies.</div>\n    <div id=\"cf-error-details\" class=\"cf-error-details-wrapper\">\n      <div class=\"cf-wrapper cf-header cf-error-overview\">\n        <h1 data-translate=\"challenge_headline\">One more step</h1>\n        <h2 class=\"cf-subheadline\"><span data-translate=\"complete_sec_check\">Please complete the security check to access</span> angel.co</h2>\n      </div><!-- /.header -->\n      \n      <div class=\"cf-section cf-highlight cf-captcha-container\">\n        <div class=\"cf-wrapper\">\n          <div class=\"cf-columns two\">\n            <div class=\"cf-column\">\n            \n              <div class=\"cf-highlight-inverse cf-form-stacked\">\n                <form class=\"challenge-form\" id=\"challenge-form\" action=\"/?__cf_chl_captcha_tk__=0d94e2e21d2ef06ef34bd2b5b4667f279b690108-1589373348-0-ATT4PiY_pQI1dGVw0_sZDV32_7x4mqtO4RepyD-L4i6zBJiIuml25fVlyJaK8uXNJw5ZWnzGlb6y0jGJJ8HIdEz14sOXRUoHqs_naHtwFEQywa8qZf_rwHsBxIUD5y_FNPph6TDrcfLVnQaN9eyy5VjiznzH4y0yeK8cidNnd-qNGw4OIZbFLfv8299DGhvNnBgsbn3BiQ9bkoGOtE4wANUh5U2LTJVAWhlquAvfhjCu6jHlYRXtN5GdnNvfBbCYwWGwCX0j88J-qCjJFOrSvx1_xraYtpB_Y8PpLHZTob_t8POfE0kJpn9ZYxwjhLQhqAAcIoE8fRe7Lv_50pzummklgMLgTRT2_NJGiE-_jNEogQmoTCvGOOmhNCe28SVYkXop9Ajm-z-6xwgoKQnY7EwekXJZCs-4nwpWJ9Gh3HBgVxZRiuv_wKgcmU0sPlLXSL5G8yOVdbBKBtHhQyqadtmTSg_IC2HV7SiYqPoJMmpJGfxxUm1au7ZS9ZiLpokjI5pQDZLpT2ZG-6jVfnTKvt9w_qmMtUSBhDleXd8mG59r\" method=\"POST\" enctype=\"application/x-www-form-urlencoded\">\n  <input type=\"hidden\" name=\"r\" value=\"bfe8db4864c274e3ed80528a0e0ad233279c00b9-1589373348-0-AVWaRjujNq/XSmYrRyYxyBLhp5bbxA92rBX2qiiOx9PVWzas1b/usxApmblw248v1q5iUvP/V/GYHXhQF1UBviAqExhVjGW4upmNgdEf/zdFWHbgQb/s0RdZyMS+rurne8Y7aKD8ppx/WHjY8eSxVTGcHePc+qs/NdCt33voCLk2sGd0inuxibNjFXkBT62qs/JshlzaDsM58mC/jdSBRHZiOoJHmteJ0J1vwDVTVumWM97Qrc9fDyAqvDo72LCfqq0uG6hppWsi/z5jnGhTwzmJ7biqcY3BThvQAABSgD80MH4unfjys3iYhsefX0tfuAm23Rx1BCoKDRrrnWy0//Z9D0vI3petRmLSerLnJUAqCRh6ZoRqahYwNTPr39G+/WBJBsh3UDfB0+PwSmGsczRmL6DDbDu023etpAhehWcdR55ftEcijKiEnnfZE4vyKYm4C835QoKlQ+odT+u7syO/u/PgoyguQxqnNoKdlSSCs4+96s86urmY/yM9T4dvZdB4K4aOVH5cNfRHc8fsqeKpcuxmBbHOmIYIAegjTd5iKB4OQtxPHti1ZQCLeP74OiAxF6UgH+bCBp+h2mfU19CtEXvfcQdxGXPDT/iAPbPZG8c7fubDCKUympyb5nbHzVUcL9IGTlCq1zN7B1pRFj/O6JKOGBRo+q0OEs0nI7l/RFvmDfEtA0FYSC4IGegEs//fUsB165Zdm2SdKk7/cy89Xd4Hy5cedzqmjrtKNw5zjvfjqaNU7FlUL38irfopK/Pyk5Fp/HdV7iMvflIJO1M7GedTWdcNKB/OqPGV9NuJaKYgJbgBrxS4iYtHw9ZZsKWogYCig+eYiU8ty/MSDus9zCE2yRIbLVQ59AFwqTwODgBaV2nJepBDxcXVauCpdHiGbi7Q9M4t1eyGafFUKasv3unzdriRTrFPZ+44ZQb3gYberTMv2f3MwfcryaFgxcgtu43w8Hy5nviA9sOeoLmPYMZtL85QbB+AzKCXJV5DfIGcMvx1aeD/D9QNyOSTakVv2tAwxnP5UeQj8mJKGHTYrIsOMFDfxSnQ2lVzMRPQYmeEes8KjFvYrGyQ82Io+hGnKYOHX1T1ioi+wh+MGacVaSC1VMfG6rdIauPSxbB9WNxqnJxKz7SxHNiV3Gwm4rgUOs+vN2tSPyfINt12OHU=\">\n  <input type=\"hidden\" name=\"cf_captcha_kind\" value=\"h\">\n  <script type=\"text/javascript\" src=\"/cdn-cgi/scripts/hcaptcha.challenge.js\" data-type=\"normal\"  data-ray=\"592c6be52bb00cb1\" async data-sitekey=\"33f96e6a-38cd-421b-bb68-7806e1764460\"></script>\n  <noscript id=\"cf-captcha-bookmark\" class=\"cf-captcha-info\">\n  <h1 data-translate=\"turn_on_js\" style=\"color:#bd2426;\">Please turn JavaScript on and reload the page.</h1>\n  </noscript>\n  <div id=\"trk_captcha_js\" style=\"background-image:url('/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=592c6be52bb00cb1')\"></div>\n</form>\n\n              </div>\n            </div>\n\n            <div class=\"cf-column\">\n              <div class=\"cf-screenshot-container\">\n              \n                <span class=\"cf-no-screenshot\"></span>\n              \n              </div>\n            </div>\n          </div><!-- /.columns -->\n        </div>\n      </div><!-- /.captcha-container -->\n\n      <div class=\"cf-section cf-wrapper\">\n        <div class=\"cf-columns two\">\n          <div class=\"cf-column\">\n            <h2 data-translate=\"why_captcha_headline\">Why do I have to complete a CAPTCHA?</h2>\n            \n            <p data-translate=\"why_captcha_detail\">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n          </div>\n\n          <div class=\"cf-column\">\n            <h2 data-translate=\"resolve_captcha_headline\">What can I do to prevent this in the future?</h2>\n            \n\n            <p data-translate=\"resolve_captcha_antivirus\">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n\n            <p data-translate=\"resolve_captcha_network\">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n            \n              \n            \n          </div>\n        </div>\n      </div><!-- /.section -->\n      \n\n      <div class=\"cf-error-footer cf-wrapper\">\n  <p>\n    <span class=\"cf-footer-item\">Cloudflare Ray ID: <strong>592c6be52bb00cb1</strong></span>\n    <span class=\"cf-footer-separator\">&bull;</span>\n    <span class=\"cf-footer-item\"><span>Your IP</span>: 128.151.150.1</span>\n    <span class=\"cf-footer-separator\">&bull;</span>\n    <span class=\"cf-footer-item\"><span>Performance &amp; security by</span> <a href=\"https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer\" id=\"brand_link\" target=\"_blank\">Cloudflare</a></span>\n    \n  </p>\n</div><!-- /.error-footer -->\n\n\n    </div><!-- /#cf-error-details -->\n  </div><!-- /#cf-wrapper -->\n\n  <script type=\"text/javascript\">\n  window._cf_translation = {};\n  \n  \n</script>\n\n\n</body>\n</html>\n",
    :trace-redirects []}
               support.clj:  201  slingshot.support/stack-trace
                client.clj:  253  clj-http.client/exceptions-response
                client.clj:  244  clj-http.client/exceptions-response
                client.clj:  262  clj-http.client/wrap-exceptions/fn
                client.clj:  734  clj-http.client/wrap-accept/fn
                client.clj:  756  clj-http.client/wrap-accept-encoding/fn
                client.clj:  717  clj-http.client/wrap-content-type/fn
                client.clj:  958  clj-http.client/wrap-form-params/fn
                client.clj:  992  clj-http.client/wrap-nested-params/fn
                client.clj: 1016  clj-http.client/wrap-flatten-nested-params/fn
                client.clj:  892  clj-http.client/wrap-method/fn
               cookies.clj:  131  clj-http.cookies/wrap-cookies/fn
                 links.clj:   63  clj-http.links/wrap-links/fn
                client.clj: 1045  clj-http.client/wrap-unknown-host/fn
                client.clj: 1173  clj-http.client/request*
                client.clj: 1166  clj-http.client/request*
                client.clj: 1179  clj-http.client/get
                client.clj: 1175  clj-http.client/get
               RestFn.java:  410  clojure.lang.RestFn/invoke
                      REPL:   62  user/eval44577
                      REPL:   62  user/eval44577
             Compiler.java: 7177  clojure.lang.Compiler/eval
             Compiler.java: 7132  clojure.lang.Compiler/eval
                  core.clj: 3214  clojure.core/eval
                  core.clj: 3210  clojure.core/eval
    interruptible_eval.clj:   91  nrepl.middleware.interruptible-eval/evaluate/fn
                  main.clj:  437  clojure.main/repl/read-eval-print/fn
                  main.clj:  437  clojure.main/repl/read-eval-print
                  main.clj:  458  clojure.main/repl/fn
                  main.clj:  458  clojure.main/repl
                  main.clj:  368  clojure.main/repl
               RestFn.java:  137  clojure.lang.RestFn/applyTo
                  core.clj:  665  clojure.core/apply
                  core.clj:  660  clojure.core/apply
                regrow.clj:   18  refactor-nrepl.ns.slam.hound.regrow/wrap-clojure-repl/fn
               RestFn.java: 1523  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   84  nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:   56  nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:  155  nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
                  AFn.java:   22  clojure.lang.AFn/run
               session.clj:  190  nrepl.middleware.session/session-exec/main-loop/fn
               session.clj:  189  nrepl.middleware.session/session-exec/main-loop
                  AFn.java:   22  clojure.lang.AFn/run
               Thread.java:  748  java.lang.Thread/run

Where the body says <html class=\"no-js ie6 oldie\" lang=\"en-US\">. How do I fix this? It's supposed to return the actual webpage?

dakrone commented 4 years ago

It looks like the page is detecting that javascript is not enabled (because it's not!), and then presenting a captcha, in the meantime though it's returning a 403. You could try changing the :user-agent, but unfortunately, it's the page that's behaving poorly, not clj-http.