caddyserver / forwardproxy

Forward proxy plugin for the Caddy web server
Apache License 2.0
603 stars 228 forks source link

commit 27401eb break upstream proxy support with gzip #49

Open bash99 opened 6 years ago

bash99 commented 6 years ago

1. Is bug reproducible with latest forwardproxy build?

Yes and I've try build myself, a035ebe works, and 27401eb failed.

2. What are you trying to do?

forward with upstream caddy proxy

3. What is your entire Caddyfile?

upstream setting

xyz.xyz.info {
    forwardproxy {
        basicauth aaa aaa
        probe_resistance myaaa.local
        serve_pac        /foraaa.pac
        response_timeout 30
        dial_timeout     30
    }
    root /var/www/html
    gzip
}

local proxy setting

:54086 {
    forwardproxy {
        response_timeout 30
        dial_timeout     30
        upstream  https://aaa:aaa@xyz.xyz.info:443
    }
    root /root/bin/caddy/html
}

4. How is your client configured?

curl --connect-timeout 5 -m 35 --proxy 127.0.0.1:54086 http://mydl.xyz.info/dl/aaa.txt

5. How did you run Caddy? (give the full command and describe the execution environment). If multiple servers are used (for example with upstream), describe those as well.

just start caddy from command line.

6. Please paste any relevant HTTP request(s) here.

curl --connect-timeout 5 -m 35 --proxy 127.0.0.1:54086 http://mydl.xyz.info/dl/aaa.txt

7. What did you expect to see?

download the aaa.txt and print it on console

8. What did you see instead (give full error messages and/or log)?

502 Bad Gateway
23/Aug/2018:15:57:54 +0800 [ERROR 502 /dl/aaa.txt] failed to read http response: http2: server sent GOAWAY and closed the connection; LastStreamID=5, ErrCode=NO_ERROR, debug=""
23/Aug/2018:16:08:34 +0800 [ERROR 502 /dl/aaa.txt] failed to read http response: read tcp 192.168.99.191:13532->199.xxx.xxx.249:443: i/o timeout
23/Aug/2018:16:10:50 +0800 [ERROR 502 /dl/aaa.txt] failed to read http response: read tcp 192.168.99.191:14490->199.xxx.xxx.249:443: i/o timeout

use strace on both side, I've found upstream caddy actually write the response

[pid 13885] write(9, "GET /dl/aaa.txt HTTP/1.1\r\nHost: 23.227.184.155\r\nUser-Agent: curl/7.52.1\r\nAccept: */*\r\nForwarded: for=\"127.0.0.1:33480\"\r\nVia: 1.1 caddy\r\n\r\n", 138) = 138
[pid 13885] write(8, "\27\3\3\0\0356\212\"\244\244\216 ~\246\254\25\242_L\2148\353s!k\272a\336\276/\33\10'x", 34) = 34
[pid 13885] write(8, "\27\3\3\0\35Vh\10\224\2\2529d\250\22[&\210\3k8\330\221\336\212\320\211\7\1V\363_?g", 34) = 34
[pid 13885] read(9, "HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 8\r\nContent-Type: text/plain; charset=utf-8\r\nEtag: \"pdwlpn8\"\r\nLast-Modified: Thu, 23 Aug 2018 07:48:59 GMT\r\nServer: Caddy\r\nDate: Thu, 23 Aug 2018 08:23:06 GMT\r\n\r\nkillfbx\n", 32768) = 224
[pid 13885] write(8, "\27\3\3\0#\320\7\202\263\205\371A?C\222\371\36U\334\351#Xn\252\7J\324\205T\r\236%p \347m.5\253\260", 40 <unfinished ...>
[pid 13884] read(9, 0xc420684000, 32768) = -1 EAGAIN (Resource temporarily unavailable)
[pid 13885] <... write resumed> )       = 40
[pid 13884] read(8, "", 1024)           = 0
[pid 13884] write(8, "\25\3\3\0\22\272\345\177\262\24\324\33\257\34~\320\341e\247\330\346%2", 23) = 23
[pid 13880] shutdown(9, SHUT_WR)        = 0
[pid 13885] read(9, "", 32768)          = 0

and it seems local caddy has also received it

[pid  4305] write(6, "\27\3\3\0\243\315\252\307\367a\1\f\371\276Y\35vD\356D\240\275\347[\267\344Mj\336\240\242\265Ry%,\201z\305\17\353\37\201\255=\267,1=\334\310\20\7\3400\3]\354B\"1\223s\10\324=\\ \202\25\224>
\264\3712\331X\261\376\321\233\22\220-Q\7{}0\r\5\t\37\232\231a\310\310V\314\313G-)Zu-\343\275\34Wr6F\304?\177l\265\31?\364\fkk\"Z\2teS}\v%\274#\nuR\370G\310\245\261[\351\216p\2\365\10\227\244\212~M\240\307\347\216\6\273\370'@`H", 168) = 168
[pid  4331] read(6, "\27\3\3\0\0356\212\"\244\244\216 ~\246\254\25\242_L\2148\353s!k\272a\336\276/\33\10'x\27\3\3\0\35Vh\10\224\2\2529d\250\22[&\210\3k8\330\221\336\212\320\211\7\1V\363_?g", 1024) = 68
[pid  4331] read(6, 0xc4205b2000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid  4331] read(6, "\27\3\3\0#\320\7\202\263\205\371A?C\222\371\36U\334\351#Xn\252\7J\324\205T\r\236%p \347m.5\253\260", 4096) = 40
[pid  4331] read(6, 0xc4205b2000, 4096) = -1 EAGAIN (Resource temporarily unavailable)

but caddy don't think the request is finished.

9. How can someone who is starting from scratch reproduce the bug as minimally as possible?

use above config.

sergeyfrolov commented 6 years ago

@bash99 27401ebe3f2142da6436cae1f16c1f373e9a3f3e is not the latest commit, 05b2092e07f9d10b3803d8fb9775d2f87dc58590 is. Did you try it on both upstream and local? It involves a fix, that might be related to the issue you describe.

sergeyfrolov commented 6 years ago

I cannot reproduce your issue with the latest build: it works for me. I used the same exact setup as you did with locahost upstreaming proxy on :54086

bash99 commented 6 years ago

@sergeyfrolov I'm very sorry for miss a configuration line which proved to be critical.

it's broken with gzip config in upstream

xyz.xyz.info {
    forwardproxy {
        basicauth aaa aaa
        probe_resistance myaaa.local
        serve_pac        /foraaa.pac
        response_timeout 30
        dial_timeout     30
    }
    root /var/www/html
    gzip
}
sergeyfrolov commented 6 years ago

I dug deeper into this issue, but had no success discovering the cause. I suspect it gets compressed twice or Content-Length is not updated properly somewhere.

@bash99 I will note, however, that the only reason for you to explicitly enable gzip is to compress your files that you host on xyz.xyz.info. Forward proxied traffic will be compressed anyway, assuming both client and target support compression.

sergeyfrolov commented 6 years ago

It appears to me that gzip writes header (which updates Content-Length and other things), only if header wasn't written already. Since the header is always written by the forwardproxy, gzip never will. However, apparently, data still gets compressed (potentially for a second time). @mholt does that sound like a bug in gzip?

mholt commented 6 years ago

I'm not entirely sure, because even if it was a bug, I'm not sure how gzip could change the Content-Length header value after the headers are written. Notice how we just omit that header, here, because of that issue: https://github.com/mholt/caddy/blob/d47b041923daf055bb97c94f29a7f5bf14a4cb2a/caddyhttp/gzip/gzip.go#L114-L119

sergeyfrolov commented 6 years ago

But does it (re-)compress, when the data was compressed/header was written already? Seems to me like it does (re-)compress, but perhaps shouldn't in either of those cases?

mholt commented 6 years ago

If it's already compressed, then you're right, it shouldn't re-compress, that might be a bug. It'd be the first I've/we've heard of it though :) (Seems a bit unusual.)

bash99 commented 6 years ago

@sergeyfrolov Yes, I can live without the gzip config line.

One more thing I can provide is, if I use firefox/chrome to access upstream proxy directly (with gzip config line), I've no problem at all.