cactus / go-camo

A secure image proxy server
MIT License
254 stars 48 forks source link

Ignore `Content-Type` of redirects #63

Closed di closed 10 months ago

di commented 11 months ago

Specifications

Please list the go-camo version, as well as the Operation System (and version) that go-camo is running on. The go-camo version can be found by go-camo -V.

Version: v2.4.3 (we are actually running a fork, though: https://github.com/pypi/camo) Platform: Linux

Expected Behavior

go-camo does not consider the Content-Type of redirects as it follows the redirects, only the Content-Type of the final response. Since it's a bit ambiguous what a valid Content-Type for a redirect is, go-camo should not error out based on the Content-Type of a redirect response.

Actual Behavior

The application returns a 404 Not Found as soon as it encounters a redirect response with a non-image Content-Type.

Steps to reproduce

I don't have an example online to try this against anymore (because the image hosting service which produced this behavior has since been updated to return a different Content-Type) but this could be reproduced with a simple HTTP service that responds with a 30x redirect and sets the Content-Type header to something like application/json.

dropwhile commented 11 months ago

I can't seem to reproduce this.

Test setup

Simple server

Simple server setting content-type header on location/redirect response:

from http.server import HTTPServer, BaseHTTPRequestHandler

class Handler(BaseHTTPRequestHandler):
    protocol_version = 'HTTP/1.1'
    def do_GET(self):
        self.send_response(301)
        self.send_header('content-type', 'application/json')
        self.send_header('content-length', '0')
        self.send_header('location', 'https://pypi.org/static/images/logo-small.2a411bc6.svg')
        self.end_headers()

httpd = HTTPServer(('localhost', 8000), Handler)
httpd.serve_forever()

curl request output when hitting test server

~% curl -v http://127.0.0.1:8000/image.svg
*   Trying 127.0.0.1:8000...
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> GET /image.svg HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Server: BaseHTTP/0.6 Python/3.11.5
< Date: Thu, 31 Aug 2023 00:13:28 GMT
< content-type: application/json
< content-length: 0
< location: https://pypi.org/static/images/logo-small.2a411bc6.svg
< 
* Connection #0 to host 127.0.0.1 left intact

go-camo url generated with url-tool

go-camo% ./build/bin/url-tool -k test encode -b base64 -p http://127.0.0.1:8080 http://127.0.0.1:8000/image.svg
http://127.0.0.1:8080/ZjI9U0gzk_7pETckVGbk3ttQZQA/aHR0cDovLzEyNy4wLjAuMTo4MDAwL2ltYWdlLnN2Zw

go camo (with ip filtering disabled/removed to allow testing locally)

go-camo% ./build/bin/go-camo -v -k test --listen 127.0.0.1:8080

curl request output when hitting go-camo

~% curl -sv -o /dev/null http://127.0.0.1:8080/ZjI9U0gzk_7pETckVGbk3ttQZQA/aHR0cDovLzEyNy4wLjAuMTo4MDAwL2ltYWdlLnN2Zw
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /ZjI9U0gzk_7pETckVGbk3ttQZQA/aHR0cDovLzEyNy4wLjAuMTo4MDAwL2ltYWdlLnN2Zw HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: max-age=315360000, public, immutable
< Content-Length: 54786
< Content-Security-Policy: default-src 'none'; img-src data:; style-src 'unsafe-inline'
< Content-Type: image/svg+xml
< Date: Thu, 31 Aug 2023 00:12:25 GMT
< Etag: "64b6a30e-d602"
< Last-Modified: Tue, 18 Jul 2023 14:34:54 GMT
< Server: go-camo
< X-Content-Type-Options: nosniff
< X-Xss-Protection: 1; mode=block
< 
{ [1378 bytes data]
* Connection #0 to host 127.0.0.1 left intact

go camo logs

go-camo% ./build/bin/go-camo -v -k test --listen 127.0.0.1:8080
time="2023-08-30T17:12:21.746636000-07:00" level="D" msg="debug logging enabled"
time="2023-08-30T17:12:21.746764000-07:00" level="I" msg="Starting HTTP server on: tcp:127.0.0.1:8080"
time="2023-08-30T17:12:26.004294000-07:00" level="D" msg="client request" content_length="0" header="map[Accept:[*/*] User-Agent:[curl/8.1.2]]" host="127.0.0.1:8080" method="GET" path="/ZjI9U0gzk_7pETckVGbk3ttQZQA/aHR0cDovLzEyNy4wLjAuMTo4MDAwL2ltYWdlLnN2Zw" proto="HTTP/1.1" remote_addr="127.0.0.1:57558" transfer_encoding="[]"
time="2023-08-30T17:12:26.004365000-07:00" level="D" msg="signed client url" url="http://127.0.0.1:8000/image.svg"
time="2023-08-30T17:12:26.004391000-07:00" level="D" msg="built outgoing request" content_length="0" header="map[Accept:[image/*] User-Agent:[go-camo] Via:[go-camo]]" host="127.0.0.1:8000" method="GET" path="/image.svg" proto="HTTP/1.1" remote_addr="" transfer_encoding="[]"
time="2023-08-30T17:12:26.038259000-07:00" level="D" msg="response from upstream" content_length="54786" header="map[Accept-Ranges:[bytes] Access-Control-Allow-Origin:[*] Cache-Control:[max-age=315360000, public, immutable] Connection:[keep-alive] Content-Length:[54786] Content-Type:[image/svg+xml] Date:[Thu, 31 Aug 2023 00:12:26 GMT] Etag:[\"64b6a30e-d602\"] Last-Modified:[Tue, 18 Jul 2023 14:34:54 GMT] Strict-Transport-Security:[max-age=31536000; includeSubDomains; preload] Vary:[Accept-Encoding] X-Cache:[HIT, HIT] X-Cache-Hits:[293, 1] X-Content-Type-Options:[nosniff] X-Frame-Options:[deny] X-Permitted-Cross-Domain-Policies:[none] X-Served-By:[cache-iad-kiad7000117-IAD, cache-pdx12329-PDX] X-Timer:[S1693440746.082145,VS0,VE2] X-Xss-Protection:[1; mode=block]]" proto="HTTP/1.1" status="200" transfer_encoding="[]"
time="2023-08-30T17:12:26.041701000-07:00" level="D" msg="response to client" headers="map[Accept-Ranges:[bytes] Cache-Control:[max-age=315360000, public, immutable] Content-Length:[54786] Content-Security-Policy:[default-src 'none'; img-src data:; style-src 'unsafe-inline'] Content-Type:[image/svg+xml] Date:[Thu, 31 Aug 2023 00:12:25 GMT] Etag:[\"64b6a30e-d602\"] Last-Modified:[Tue, 18 Jul 2023 14:34:54 GMT] Server:[go-camo] X-Content-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block]]" status="200"

hypothesis

Without seeing logs or being able to reproduce this with the simple setup above, I can offer a few hypotheticals:


Was my attempt at reproduction about what you had envisioned as repro steps?
Do you have any further information/logs/etc on the issue?

di commented 11 months ago

Perhaps the server in question was redirecting more than 3 times. MaxRedirects is a go-camo cli flag tunable, but the default redirection limit is configured to 3.

I don't think this is it. The original URL in question was https://api.securityscorecards.dev/projects/github.com/di/id/badge, which only issues a single redirect to a URL that responds with a 200:

The response from the proxy was:

$ curl -v https://pypi-camo.global.ssl.fastly.net/ac31ea219643944969bd06dca6dc02a6b4d6dc06/68747470733a2f2f6170692e736563757269747973636f726563617264732e6465762f70726f6a656374732f6769746875622e636f6d2f64692f69642f6261646765
...
< HTTP/1.1 404 Not Found
< Connection: keep-alive
< Content-Length: 10
< Content-Type: text/plain; charset=utf-8
< Content-Security-Policy: default-src 'none'; img-src data:; style-src 'unsafe-inline'
< X-Content-Type-Options: nosniff
< X-Xss-Protection: 1; mode=block
< Accept-Ranges: bytes
< Date: Wed, 30 Aug 2023 14:13:47 GMT
< Via: 1.1 varnish
< Age: 0
< X-Served-By: cache-fty21335-FTY
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1693404827.906100,VS0,VE137
< Strict-Transport-Security: max-age=300
<
Not Found

(note that this now works as expected)

The response from the original URL was:

$ curl -v  https://api.securityscorecards.dev/projects/github.com/di/id/badge
...
< HTTP/2 302
< content-type: application/json
< location: https://img.shields.io/ossf-scorecard/github.com/di/id?label=openssf scorecard&style=flat
< vary: Origin
< x-cloud-trace-context: 7afcef1c21068ac089b2880adcbdeb5a
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
< x-envoy-decorator-operation: ingress GetBadge
< date: Wed, 30 Aug 2023 14:08:36 GMT
< server: Google Frontend
< content-length: 0

(note that the Content-Type here has since changed)

Perhaps there was something else wrong with the response, such as being an http/1.1 response without a content-length, and the Go http library may have been refusing to process it for some reason. (logs would hopefully be informative here)

Looks like both responses here had content-length, so maybe we can rule this out.

If go-camo is configured to use an outgoing proxy (eg. smokescreen, squid), perhaps that proxy was rejecting the redirect response for some reason.

Nope, not configured to use an outgoing proxy.

Some other heretofore unknown bug in go-camo doing something unexpected.

Since changing the Content-Type of the redirect has resolved the issue, I definitely think it's related to this. My read of https://github.com/cactus/go-camo/blob/4d65728288768aeaf34577a9bbe18072aa910af0/pkg/camo/proxy.go#L483 is that the the Content-Type would be evaluated against acceptTypes every time a redirect is followed, but maybe I'm mis-reading that.

My guess is that maybe something changed here between our fork and what you're testing against, although I don't see anything obvious that would be affecting this.

Was my attempt at reproduction about what you had envisioned as repro steps?

Yes, I think it's accurate, aside from the original response being a 302 and not a 301 (although it doesn't seem to matter)

Do you have any further information/logs/etc on the issue?

Unfortunately this instance receives a lot of traffic and I'm unable to extract logs specifically for this edge case, hopefully the above will suffice.

dropwhile commented 11 months ago

Nothing really jumps out at me in the diff between your fork and here either.

As far as code flow goes:

This is the function that validates redirects: https://github.com/cactus/go-camo/blob/4d65728288768aeaf34577a9bbe18072aa910af0/pkg/camo/proxy.go#L592-L608

And the above function really just checks for redirect depth, and does some url checks (avoiding things like redirects as SSR vectors), calling this function for the url checks: https://github.com/cactus/go-camo/blob/master/pkg/camo/proxy.go#L391-L439

The net Dialer is involved a bit as well when following redirects (connecting to new hostnames), ensuring that hostnames/dns don't resolve into SSR vectors either: https://github.com/cactus/go-camo/blob/4d65728288768aeaf34577a9bbe18072aa910af0/pkg/camo/proxy.go#L499-L526

None of that has anything to do with content-type checking though. The Content-type checking happens here (https://github.com/cactus/go-camo/blob/master/pkg/camo/proxy.go#L259-L295) as part of 20x level responses. 30x level responses should be auto-followed unless one of the aforementioned checks (redirect depth, url/SSR, hostname/SRR) fails. If it does fail, it ends up here https://github.com/cactus/go-camo/blob/master/pkg/camo/proxy.go#L300-L304

Just to see if there was some strange issue with http2 and Go itself with redirects and headers in http2 responses, I setup another test server with an http/2 endpoint, returning the same location target url as you noted above:

< HTTP/2 302 
< server: nginx
< date: Thu, 31 Aug 2023 03:18:14 GMT
< content-type: application/json
< content-length: 0
< location: https://img.shields.io/ossf-scorecard/github.com/di/id?label=openssf scorecard&style=flat
< strict-transport-security: max-age=31536000
< x-content-type-options: nosniff
< content-security-policy: default-src 'self';style-src 'self' 'unsafe-inline';img-src 'self' data:;object-src 'none';frame-ancestors 'self';upgrade-insecure-requests;base-uri 'self'; form-action 'none';
< x-xss-protection: 1; mode=block
< referrer-policy: same-origin
< cache-control: public
< x-frame-options: SAMEORIGIN
< permissions-policy: interest-cohort=()

No issues. go-camo processed it just fine in my single-request attempt at reproduction.

What version of Go are you using? go-camo -V should output what was used to build it. Something like this:

./build/bin/go-camo -V 
go-camo 2.4.4 (go1.21.0,gc-arm64)
di commented 10 months ago

Thanks for the detailed analysis. I'm going to go ahead and close this and chalk it up to something different on our fork, since the original service was able to update the content type this hasn't reoccured and I don't have as much of a need to dig into why it was happening.

Thanks again!

dropwhile commented 10 months ago

@di Sounds good, appreciate for the follow up. 👍