Closed sugdyzhekov closed 5 years ago
@sugdyzhekov Interesting, could you attach a tcpdump in addition to the nc output?
Could be this line: https://github.com/fabiolb/fabio/pull/423/files#diff-7c73f6a0cf2ff2671c80a43d0aad92a3R46
this is the sni proxy. sorry
Just discovered this issue too (I assume it's the same). We noticed because our SMTP server seemed to be unresponsive.
If the server starts the conversation after a client connects (as opposed to the client sending the first message, which works fine), fabio doesn't send data to the client.
Both SMTP and MySQL servers start a connection by sending a banner first.
Here is my reproducer:
// smtp_mock.go
package main
import (
"fmt"
"log"
"net"
)
func main() {
log.Println("Listening on :8025")
l, err := net.Listen("tcp", ":8025")
if err != nil {
log.Fatal(err)
}
for {
conn, err := l.Accept()
if err != nil {
log.Fatal(err)
}
log.Println("serving", conn.RemoteAddr())
fmt.Fprintf(conn, "220 localhost ESMTP Go\r\n")
conn.Close()
}
}
$ go run smtp_mock.go
2018/12/06 14:48:42 Listening on :8025
Next shell:
$ curl -XPUT localhost:8500/v1/agent/service/register -d@-<<<'
{
"Address": "127.0.0.1",
"Check": {
"Interval": "5s",
"TCP": "localhost:8025"
},
"ID": "09bf6151-243b-487e-a062-ea9741839818",
"Name": "mock-smtpd",
"Port": 8025,
"Tags": [
"dev-fabio-:8026"
]
}'
$ fabio \
-registry.consul.tagprefix=dev-fabio- \
-proxy.addr=':8026;proto=tcp' \
-log.level=TRACE
Next shell:
# Connect directly
$ telnet localhost 8025
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 localhost ESMTP Go go1.11.2
Connection closed by foreign host.
# Connect via fabio
$ telnet localhost 8026
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
<hangs indefinitely>
asdf<enter>
220 localhost ESMTP Go go1.11.2
Connection closed by foreign host.
There's nothing interesting in fabio's logs, not even at TRACE level. Git bisect points fingers at a28d24440a78c441e189e90d8671332122fea90f:
a28d24440a78c441e189e90d8671332122fea90f is the first bad commit
commit a28d24440a78c441e189e90d8671332122fea90f
Author: Aaron Hurt
Date: Mon Feb 12 21:24:00 2018 -0600
add basic ip centric access control on routes
:040000 040000 8c2e3c4fd614bac972c3e5cf7f3c5d4403e4b462 3cdbf0b8f4e80bf5a4a0a6567227c936cb0cc77a M docs
:040000 040000 e046f4421d2a958f63712cc59a843509ae486d86 5681ab0cef7610c12c29a35a6fb262d856f7a952 M proxy
:040000 040000 34db75362cd6f9ab3a9125f8802e32f892bdc678 491cb45fa2072e457fec37e32cdb49cb1ae383b5 M route
Here's a tcpdump. Note the two second gap after the third packet. That's where telnet was connected to fabio, but I didn't do anything. Then I hit enter, and then fabio established a connection to the backend.
15:15:26.826545 IP 127.0.0.1.53618 > 127.0.0.1.8026: Flags [S], seq 2552979650, win 43690, options [mss 65495,sackOK,TS val 3127993891 ecr 0,nop,wscale 7], length 0
15:15:26.826565 IP 127.0.0.1.8026 > 127.0.0.1.53618: Flags [S.], seq 193147615, ack 2552979651, win 43690, options [mss 65495,sackOK,TS val 3127993891 ecr 3127993891,nop,wscale 7], length 0
15:15:26.826586 IP 127.0.0.1.53618 > 127.0.0.1.8026: Flags [.], ack 1, win 342, options [nop,nop,TS val 3127993891 ecr 3127993891], length 0
15:15:28.529959 IP 127.0.0.1.53618 > 127.0.0.1.8026: Flags [P.], seq 1:3, ack 1, win 342, options [nop,nop,TS val 3127995594 ecr 3127993891], length 2
15:15:28.529988 IP 127.0.0.1.8026 > 127.0.0.1.53618: Flags [.], ack 3, win 342, options [nop,nop,TS val 3127995594 ecr 3127995594], length 0
15:15:28.530519 IP 127.0.0.1.54084 > 127.0.0.1.8025: Flags [S], seq 2746406685, win 43690, options [mss 65495,sackOK,TS val 3127995595 ecr 0,nop,wscale 7], length 0
15:15:28.530559 IP 127.0.0.1.8025 > 127.0.0.1.54084: Flags [S.], seq 2444991286, ack 2746406686, win 43690, options [mss 65495,sackOK,TS val 3127995595 ecr 3127995595,nop,wscale 7], length 0
15:15:28.530587 IP 127.0.0.1.54084 > 127.0.0.1.8025: Flags [.], ack 1, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
15:15:28.530775 IP 127.0.0.1.8025 > 127.0.0.1.54084: Flags [P.], seq 1:25, ack 1, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 24
15:15:28.530805 IP 127.0.0.1.54084 > 127.0.0.1.8025: Flags [.], ack 25, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
15:15:28.530844 IP 127.0.0.1.8025 > 127.0.0.1.54084: Flags [F.], seq 25, ack 1, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
15:15:28.530846 IP 127.0.0.1.54084 > 127.0.0.1.8025: Flags [P.], seq 1:3, ack 25, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 2
15:15:28.530898 IP 127.0.0.1.8025 > 127.0.0.1.54084: Flags [R], seq 2444991311, win 0, length 0
15:15:28.531008 IP 127.0.0.1.8026 > 127.0.0.1.53618: Flags [P.], seq 1:25, ack 3, win 342, options [nop,nop,TS val 3127995595 ecr 3127995594], length 24
15:15:28.531030 IP 127.0.0.1.53618 > 127.0.0.1.8026: Flags [.], ack 25, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
15:15:28.531097 IP 127.0.0.1.8026 > 127.0.0.1.53618: Flags [F.], seq 25, ack 3, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
15:15:28.531151 IP 127.0.0.1.53618 > 127.0.0.1.8026: Flags [F.], seq 3, ack 26, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
15:15:28.531178 IP 127.0.0.1.8026 > 127.0.0.1.53618: Flags [.], ack 4, win 342, options [nop,nop,TS val 3127995595 ecr 3127995595], length 0
Interesting and thank you for detailed troubleshooting. That makes a lot of sense. We use TCP proxy but haven't noticed any breakage but in our case it is all client initiated.
... and it seems I may be to blame :)
@pschultz could you try reproducing this with master? I just made some changes to the ip access control that might be the solution. In short that section wasn't properly getting the remote connection address previously.
digging into this now ... using your steps I can reproduce it on master
It is indeed the AccessDenied check but I'm not yet sure why. Will continue to investigate but commenting out the relevant lines in proxy/tcp/tcp_proxy.go
does make both test scenarios work.
I found the issue but I'm going to need to do a bit of research ... steps so far:
Using the same test scenario as above with these code changes in route/access_rules.go
:
// AccessDeniedTCP checks rules on the target for TCP proxy routes.
func (t *Target) AccessDeniedTCP(c net.Conn) bool {
log.Printf("[DEBUG] in AccessDeniedTCP 0")
var addr *net.TCPAddr
var ok bool
// validate remote address assertion
if addr, ok = c.RemoteAddr().(*net.TCPAddr); !ok {
log.Printf("[ERROR] failed to assert remote connection address for %s", t.Service)
return false
}
log.Printf("[DEBUG] in AccessDeniedTCP 1 addr=%+v", addr)
// check remote connection address
if t.denyByIP(addr.IP) {
return true
}
log.Printf("[DEBUG] in AccessDeniedTCP 2")
// default allow
return false
}
Watching the fabio logs when telnetting to port 8026 from a remote host on the LAN ...
2018/12/06 14:12:49 [DEBUG] in AccessDeniedTCP 0
2018/12/06 14:12:53 [DEBUG] in AccessDeniedTCP 1 addr=100.127.255.1:22900
2018/12/06 14:12:53 [DEBUG] in AccessDeniedTCP 2
The 0
logging line is printed right after telnet collects. The 1
logging line is printed as soon as I press <enter>
on the remote machine. It appears that golang is blocking and not returning from the RemoteAddr()
function until input is received from the client.
This looks potentially related: https://github.com/golang/go/issues/12943
This as well: https://github.com/armon/go-proxyproto/issues/1
The issue seems to be in the proxy protocol implementation ...
// RemoteAddr returns the address of the client if the proxy
// protocol is being used, otherwise just returns the address of
// the socket peer. If there is an error parsing the header, the
// address of the client is not returned, and the socket is closed.
// Once implication of this is that the call could block if the
// client is slow. Using a Deadline is recommended if this is called
// before Read()
func (p *Conn) RemoteAddr() net.Addr {
p.once.Do(func() {
if err := p.checkPrefix(); err != nil && err != io.EOF {
log.Printf("[ERR] Failed to read proxy prefix: %v", err)
}
})
if p.srcAddr != nil {
return p.srcAddr
}
return p.conn.RemoteAddr()
}
We would either have to disable proxy protocol or get the remote IP without checking for proxy protocol for server initiated requests.
Commenting out the proxy protocol wrapper allows the test to proceed.
With ln = &proxyproto.Listener{Listener: ln}
commented out in proxy/listen.go
the tests work with the ip access checks.
Logging output from a telnet check against the mock server from a remote host on the LAN:
2018/12/06 15:45:13 [DEBUG] in AccessDeniedTCP 0
2018/12/06 15:45:13 [DEBUG] in AccessDeniedTCP 1 addr=100.127.255.1:22982
2018/12/06 15:45:13 [DEBUG] in AccessDeniedTCP 2
How do we want to proceed?
My initial thoughts:
proxy.addr = 1.2.3.4:443;proxyproto=tue, \
1.2.3.5:3306;proto=tcp
proxyproto
on a tcp
listener with access
or deny
rules defined will break with server initiated servicesI think making people explicitly enable proxy protocol is also better from a security perspective.
So this is broken for TCP proxies which do not use the PROXY protocol and where the client is waiting for an initial handshake from the server which never comes b/c fabio does not forward the connection (i.e. MySQL)? And this started to become an issue when we added access control since we're looking at the RemoteAddr
which is a blocking call in the ProxyListener
? HTTP is not affected since this was fixed in the stdlib, AFAICT.
Making the PROXY protocol configurable is the better solution, IMO. (I should have done this when I added that.) People know when they are using it, one would assume. The question is what to do with the default. We can have on
and off
but both settings have drawbacks. Switching to off by default may break existing setups. Leaving it on by default means to keep it broken for users.
Who is affected if we switch the default to off
? Users on AWS with access control, for example. If we disable PROXY and they upgrade then the protocol breaks since the server is getting the PROXY header. Or it may silently work since the server can handle the PROXY protocol.
The 1.5.8 behavior was that TCP just worked and I'd prefer if we can get back to that state.
Maybe we also add an auto
mode as the new default, where we establish the connection, buffer the first 100 bytes from the client and check for the PROXY header in the TCP proxy directly to handle access control (if there is any).
However, keeping it simple also has advantages. If we decided to switch to off
by default we should mitigate the impact:
I'll see if I can put together a PoC for automatic detection.
That being said, if there is a toggle for PROXY Protocol this enables another use-case: checking that a connection is from a particular proxy (as opposed to a particular client downstream). Not sure how common that is, or if it's ever been asked for, though.
It would be good if the backend could decide if it wants to be enable PROXY or not.
I dunno if thats doable (single listener on Fabio sends PROXY to one backend but not PROXY to another backend)
armon/go-proxyproto#4 adds a ProxyHeaderTimeout
option for the listener. If there is no proxy header within that timeout, it falls back to plain TCP proxying.
After some superficial testing, updating goproxy to at least 49fdb5c and exposing that option seems to be a good compromise. We can have it default to something sensible (say, 5ms), and we can interpret zero or a negative value as "disable PROXY protocol support".
In addition we shouldn't call RemoteAddr()
unless there are access rules defined.
This will make fabio work for MySQL in the default configuration, costing a slight increase in connect latency, for exactly those users that use access rules and no PROXY protocol.
There is also a chance of subtle breakage by choosing too small a timeout.
If you want to play around, I just pushed to https://github.com/classmarkets/fabio/tree/fix-tcp (classmarkets/fabio@7246b924d892dc146ec0c3c8790045632225e71a)
I made a few commits to a new branch referenced above that address the core points mentioned thus far. The current proxy-protocol toggle defaults to false but that may need to change per @magiconair comment's above.
Regardless of the default setting I believe additional documentation and release notes will be warranted.
@pschultz I've invited you to be a collaborator so that we can work on the same branches.
Please note that this change disabled PROXY protocol by default and you have to enable it if you need it. If you feel this is a grave mistake, the please let us know. (I'll mull over this a bit more before creating a new release).
@pschultz I've invited you to be a collaborator so that we can work on the same branches.
@magiconair, thank you very much. Would you also add my SSH key, please?
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFNuN8ltMS1Eg034DHD0fekzKSkdlzlWeSMbJy/LJxOYEUQMgNdtQ7GzDZUsArTObenJRKMnpyP6I2mcwesw0xM= pschultz
@pschultz I've enabled your account to be a collaborator with write permissions. That should be sufficient. If you want to use a deploy key instead then it has to be unique for this repo since you cannot use the same key for multiple repos. Trying to add this key returns "key already in use".
When will this change be released? Can you make 1.5.11?
I'll try to release 1.5.11 tonight CET time.
Impact It's not possible to connect to MySQL since v1.5.8
STR
Expected result mysql client can establish connection to MySQL service
Actual result mysql client hangs during connection establishment.
Additional info v1.5.7
v1.5.8
Configuration
proxy_addr
mysql service register at Consul with following tag:
urlprefix-:3306 proto=tcp