Closed jcpetruzza closed 8 years ago
I'm not familiar with the details of implementation here. I'm happy to take a PR, assuming it passes tests on various OSes.
It shouldn't be hard to fix, so I'll prepare PR soon.
Incidentally, note also that if in the test suite you edit Data.Streaming.NetworkSpec
and replace "127.0.0.1" by "localhost", the test will fail in Debian, OSX and other platforms where localhost
is both 127.0.0.1
and ::1
.
It turns out this cannot fixed in getSocketFamilyGen
as I originally expected. The problem is that this function just creates a socket, which will typically succeed; the error will manifest later in getSocketFamilyTCP
(and, thus, also getSocketTCP
) when it tries to actually connect to that socket.
I was hunting for a bug today that would occur consistently on one box and never on the other. In the end, I believe
Data.Streaming.Network.getSocketFamilyGen
is to blame.This function gets a host, a port and an address-family and calls
Network.Socket.getAddrInfo
to resolve the host address. The problem lies in thatgetAddrInfo
may return more than one address and, in that case,getSocketFamilyGen
will try to connect to the first one and if that one fails, won't try with the rest.Now, why would this be a problem? Our application was using (indirectly, via a dependency)
runTCPClient
with the default settings (i.e.clientSettingsTCP
), which, in particular, set the address family toAF_UNSPEC
. Moreover, it was trying to connect to a port onlocalhost
and in Debian/etc/hosts/
includes the following definitions:Because we use
AF_UNSPEC
,getAddrInfo
would return addresses for IPv4 and IPv6, and because Debian defineslocalhost
also for IPv6, the call would return a two-element list forlocalhost
. Finally, because the first address in the returned list happened to be::1
, the connection would unexpectedly fail. Note that the other box was running Ubuntu, which reserveslocalhost
for127.0.0.1
on/etc/hosts
, so it would succeed there.This and other kind of nasty surprises are solved by going through the list of addresses returned by
getaddrinfo
until a connection succeeds. As far as I understand, this is the recommended way of using this function anyway (I also can't think of a common scenario where trying the first one and then immediately failing could be the desired behaviour).