cactus / go-camo

A secure image proxy server
MIT License
254 stars 48 forks source link

Handling Cloudflare Challenges with go-camo #61

Closed alexzeitgeist closed 11 months ago

alexzeitgeist commented 1 year ago

Specifications

Version: 2.4.3-2-ga397323 Platform: Debian Buster

Expected Behavior

Many sites use Cloudflare, which can "challenge" outgoing requests from go-camo, causing issues. Ideally, we could use fallback servers with go-camo to retry these challenged requests. Often, Cloudflare challenges occur when the go-camo server IP is temporarily "blacklisted". If go-camo could pass the request to another instance on a different server with a different IP, it's more likely to avoid blacklisting and pass through Cloudflare unchallenged.

Actual Behavior

As an example, here, the image https://www.globus.ch/cf-media/akeneo/2000402238436_FP_PNG_1/1680537905/1200.png was requested; Cloudflare "challenged" the request (Cf-Mitigated: challenge, see https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#detecting-a-challenge-page-response).

Jun 06 08:18:09 proxy-host go-camo-netgo[16037]: time="2023-06-06T08:18:09.303956624-04:00" level="D" msg="signed client url" url="URL https://www.globus.ch/cf-media/akeneo/2000402238436_FP_PNG_1/1680537905/1200.png was requested."
Jun 06 08:18:09 proxy-host go-camo-netgo[16037]: time="2023-06-06T08:18:09.307941800-04:00" level="D" msg="built outgoing request" req="content_length=\"0\" transfer_encoding=\"[]\" host=\"www.globus.ch\" remote_addr=\"\" method=\"GET\" path=\"\" proto=\"HTTP/1.1\" header=\"map[Accept:[image/*] Accept-Language:[en-US,en;q=0.9,de;q=0.8] Cache-Control:[no-cache] User-Agent:[Camo Asset Proxy] Via:[Camo Asset Proxy]]\""
Jun 06 08:18:09 proxy-host go-camo-netgo[16037]: time="2023-06-06T08:18:09.358322679-04:00" level="D" msg="response from upstream" content_length="-1" header="map[Alt-Svc:[h3=\":443\"; ma=86400] Cache-Control:[private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0] Cf-Mitigated:[challenge] Cf-Ray:[7d3098a84c5600a8-CDG]Content-Type:[text/html; charset=UTF-8] Cross-Origin-Embedder-Policy:[require-corp] Cross-Origin-Opener-Policy:[same-origin] Cross-Origin-Resource-Policy:[same-origin] Date:[Tue, 06 Jun 2023 12:18:09 GMT] Expires:[Thu, 01 Jan 1970 00:00:01 GMT] Permissions-Policy:[accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()] Referrer-Policy:[same-origin] Server:[cloudflare] Strict-Transport-Security:[max-age=15552000; preload] X-Frame-Options:[SAMEORIGIN]]" proto="HTTP/1.1" status="403" transfer_encoding="[chunked]"

As a result, go-camo isn't able to proxy the image.

dropwhile commented 1 year ago

go-camo should process the 403 response from cloudflare, as a temporary error (404) sent back to the caller. I assume that is what is happening now -- if not, then that is likely a bug.

I could see optionally passing back the cloudflare Cf-Mitigated: challenge header, enabled with a cli flag option, if that behavior is desirable.

As far as trying to avoid the "blacklisting" itself, a couple of possible solutions:

Aside from that, I'm not sure what else go-camo can do here.

Thoughts?

alexzeitgeist commented 1 year ago

Hi,

go-camo should process the 403 response from cloudflare, as a temporary error (404) sent back to the caller. I assume that is what is happening now -- if not, then that is likely a bug.

Exactly, this is what happens. Proxied images that don't pass Cloudflare appear as 404 not found.

First you can run multiple go-camo instances, and load balance across them.

Exactly, this is perhaps the best solution, in particular if the instances are located on different networks. How difficult would it be to trigger the load balance to another instance on demand when the current instance faces a cf challenge? You probably answered this question:

I could see optionally passing back the cloudflare Cf-Mitigated: challenge header, enabled with a cli flag option, if that behavior is desirable.

I am not sure how haproxy (which you mention) or similar work, but this could potentially work: if go-camo returned a failure on a Cf-Mitigated: challenge header and instruct the load balancer to try another configured go-camo instance and so on.

Cheers, Alex

dropwhile commented 1 year ago

I think something like this (patch below) on the go-camo side might work. This would pass the cf-mitigated header back to the caller, as well as return a 403 instead of a 404. A downstream proxy (haproxy, nginx, etc) could be configured to retry against another instance if it saw this combination. Not sure what else go-camo can do here without adding significant additional complexity.

I'm also hesitant to include this in the main releases of go-camo at this time, opting for simplicity for now. If more people end up running into this, it might be worth adding but under a non-default cli-flag.

In the meantime, maybe this would help, in conjunction with some type of downstream proxy configuration (like "retry-on 403"). If a different status code is desired, just replace http.StatusForbidden with whatever other status code you would want there (418, etc).

diff --git a/pkg/camo/proxy.go b/pkg/camo/proxy.go
index 552c67b..50f4960 100644
--- a/pkg/camo/proxy.go
+++ b/pkg/camo/proxy.go
@@ -306,6 +306,18 @@ func (p *Proxy) ServeHTTP(w http.ResponseWriter, req *http.Request) {
        p.copyHeaders(&h, &resp.Header, &ValidRespHeaders)
        w.WriteHeader(304)
        return
+   case 403:
+       if resp.Header.Get("cf-mitigated") == "challenge" {
+           // decide what to do if response is a cloudflare challenge
+           // this example returns a 403 and passes through the cf-mitigated header
+           w.Header().Set("cf-mitigated", "challenge")
+           http.Error(w, "Forbidden", http.StatusForbidden)
+           return
+       }
+
+       // otherwise handle normally and return 404
+       http.Error(w, "Not Found", http.StatusNotFound)
+       return
    case 404:
        http.Error(w, "Not Found", http.StatusNotFound)
        return
alexzeitgeist commented 1 year ago

Thanks, this is great! I will see how I can use your patch and report back.

dropwhile commented 11 months ago

closing for now. reopen if appropriate.