fpco / streaming-commons

Common lower-level functions needed by various streaming data libraries
MIT License
36 stars 41 forks source link

getSocketTCP ought to try connecting to all possible addresses #31

Closed jcpetruzza closed 8 years ago

jcpetruzza commented 8 years ago

I was hunting for a bug today that would occur consistently on one box and never on the other. In the end, I believe Data.Streaming.Network.getSocketFamilyGen is to blame.

This function gets a host, a port and an address-family and calls Network.Socket.getAddrInfo to resolve the host address. The problem lies in that getAddrInfo may return more than one address and, in that case, getSocketFamilyGen will try to connect to the first one and if that one fails, won't try with the rest.

Now, why would this be a problem? Our application was using (indirectly, via a dependency) runTCPClient with the default settings (i.e. clientSettingsTCP), which, in particular, set the address family to AF_UNSPEC. Moreover, it was trying to connect to a port on localhost and in Debian /etc/hosts/ includes the following definitions:

127.0.0.1   localhost
::1     localhost ip6-localhost ip6-loopback

Because we use AF_UNSPEC, getAddrInfo would return addresses for IPv4 and IPv6, and because Debian defines localhost also for IPv6, the call would return a two-element list for localhost. Finally, because the first address in the returned list happened to be ::1, the connection would unexpectedly fail. Note that the other box was running Ubuntu, which reserves localhost for 127.0.0.1 on /etc/hosts, so it would succeed there.

This and other kind of nasty surprises are solved by going through the list of addresses returned by getaddrinfo until a connection succeeds. As far as I understand, this is the recommended way of using this function anyway (I also can't think of a common scenario where trying the first one and then immediately failing could be the desired behaviour).

snoyberg commented 8 years ago

I'm not familiar with the details of implementation here. I'm happy to take a PR, assuming it passes tests on various OSes.

jcpetruzza commented 8 years ago

It shouldn't be hard to fix, so I'll prepare PR soon.

Incidentally, note also that if in the test suite you edit Data.Streaming.NetworkSpec and replace "127.0.0.1" by "localhost", the test will fail in Debian, OSX and other platforms where localhost is both 127.0.0.1 and ::1.

jcpetruzza commented 8 years ago

It turns out this cannot fixed in getSocketFamilyGen as I originally expected. The problem is that this function just creates a socket, which will typically succeed; the error will manifest later in getSocketFamilyTCP (and, thus, also getSocketTCP) when it tries to actually connect to that socket.