In dual-stack IPv4+IPv6 environments, it is not always possible to know with certainty whether or not an IPv6 address is actually routable from the current environment. Especially in containerized environments, oftentimes “dual stack” IPv4+IPv6 is actually an IPv4 stack with IPv6 loopback, or a similar configuration where Internet IPv6 addresses do not route properly. Therefore, so-called “Happy Eyeballs” algorithms, as proposed by RFC 6555, are employed in any situation where dual-stack IPv4+IPv6 networking is encountered to ensure a swift fallback when IPv6 networking is non-functional.
In Go’s net stack, this is implemented at the net.Dialer level: when resolving addresses, the net.Dialer will choose IPv6 addresses in preference as “primary” addresses, and IPv4 addresses as “fallback” addresses. The net.Dialer performs a race, first attempting to connect to the IPv6 address, and shortly after, trying to connect to the IPv4 address, concurrently. These connection attempts are raced, with the first connection that succeeds getting used and the other being closed and discarded.
In httplb, this doesn’t work: httplb creates individual connections to hosts, resolving names before the net.Dialer is called. This means the net.Dialer cannot perform its RFC 6555 race.
Proposal
This proposal asserts that the core problem with httplb today is that the Resolver returns unstructured addresses. This is appropriate for DNS, because DNS does not carry any information about hosts: it only has records that point to addresses. However, the reason why a clean IPv6 to IPv4 fallback can not be implemented is because the Resolver does not return enough information. In an ideal world, the Resolver would return both the IPv4 and IPv6 address(es) for a host as a single unit.
The current default behavior of httplb at HEAD is to prefer IPv4 addresses when present, only using IPv6 addresses when there are no IPv4 addresses present. This is probably widely compatible with existing deployments, and it is likely better than using both resolved IPv4 and IPv6 addresses, since it is very likely that some of the addresses point to the same hosts, and thus in dual stack environments it is very likely that using both IPv4 and IPv6 addresses would lead to improperly balanced load.
Resolver Interface
In order to rectify this, this proposal suggests that the Resolver return a slice of Host structures, which shall have:
Primary address. In a dual-stack system, this would probably an IPv6 address to the host.
Fallback address. Optional. In a dual-stack system, this would probably be an IPv4 address to the host.
Host-level attributes. Addresses may still have attributes if it still seems useful.
For more advanced resolvers, these values may be able to be filled in some logical fashion that is able to support a proper RFC 6555 fallback. DNS however does not provide enough information, so any solution to this problem will carry at least some downsides.
DNS Resolver implementation
There is no ideal way to implement this with DNS, so a heuristic must be used. I suggest the following:
If there are only IPv4 or only IPv6 addresses in the results, use each address as a unique host.
If there are both IPv6 and IPv4 addresses in the results, use each IPv6 address as a unique host and set the fallback address to an arbitrary IPv4 address (probably just pair them in the order they are returned by the resolver.) If there are less IPv4 addresses than IPv6 addresses, cycle through the IPv4 addresses in order. In the event that the resolver returns addresses in any logical order, this may result in host addresses being assigned correctly. In the event that there is a single anycast IPv4 but multiple IPv6 addresses, it will result in the anycast IPv4 being treated as the fallback for any of the IPv6 hosts.
Store the results between resolver runs. Before running the above algorithm, check each host from the last set of results; if both the primary and fallback address are still present, remove them from the list and add the old result back to the new result. (Since the fallback may appear multiple times, it should still be considered when checking the remaining previous results.) This ensures that resolution-to-resolution, mappings will be stable. If the resolver returns a different set of hosts each time, the default DNS resolver will already cause problems, so long-term stability in these mappings is not a concern.
The net effect of this is that the fallback IPv4 address for a given IPv6 is effectively arbitrary; this means that if any of the hosts are unhealthy, the balancing may become uneven when IPv4 fallback addresses are used and result in multiple entries in the pool balancing to the same host. Unfortunately, there's really no way to prevent this from happening with DNS alone, but I believe this is a better overall outcome, as the result today is that not explicitly specifying IPv4 or IPv6 results in other suboptimal behavior, potentially making adoption of httplb difficult in systems that might need to tolerate a large variety of possible production environments.
Transport implementation
Right now the way that the transport implementation handles targeting is by overriding the host in the URL, which requires some overhead:
The URL almost always needs to be cloned, which means the http.Request needs to be at least shallow copied.
The TLS configuration needs to be patched.
The Host header needs to be fixed.
In order to allow for this fallback behavior, we need to move this override to a lower level. This gives us the opportunity to lower the overhead of the transport implementation in the vast majority of cases, since there are far fewer cases where the request or URL will need to be cloned, and the TLS configuration will never need to be patched.
Implement RFC 6555 fallback manually
It is possible for users to provide a custom Dial function. We could wrap this again into a custom dial function that performs the RFC 6555 Dial race using the underlying implementation. The downside here is that we need to implement this race ourselves, though it is not insurmountable.
Implement a custom *net.Resolver
It is challenging but possible to override the behavior of *net.Resolver. This can be done by setting the PreferGo field to true and setting the Dial function to return an in-memory net.Pipe() that speaks DNS, ideally using the x/net/dns/dnsmessage package. (I recently did this in my test implementation.)
While this looks ugly, it seems like it is actually intended by the Go developers, and despite the text on PreferGo being somewhat unclear, it will in fact work on all platforms:
if runtime.GOOS == "plan9" {
// TODO(bradfitz): for now we only permit use of the PreferGo
// implementation when there's a non-nil Resolver with a
// non-nil Dialer. This is a sign that the code is trying
// to use their DNS-speaking net.Conn (such as an in-memory
// DNS cache) and they don't want to actually hit the network.
// Once we add support for looking the default DNS servers
// from plan9, though, then we can relax this.
if r == nil || r.Dial == nil {
return false
}
}
By doing this, we net the ability to tell net.Dial about the list of hosts and it should be able to perform graceful IPv6 fallback, probably avoiding the fallback delay as necessary.
The only problem with this approach is that it relies on being able to override the Resolver field of the net.Dialer, which precludes the ability to specify a Dial function. We would need to refactor this API so that the *net.Resolver gets passed back to the user so they can use it in their Dial function implementation (we may need to rehaul the way the override works; it should probably be a function that returns a Dial function, given a *net.Resolver.)
The advantage of this approach is that if applied properly, it should give us fallback behavior that is very close to what Go is able to offer out-of-the-box.
Summary
This proposal asserts that it is a design defect to treat mixed IPv4+IPv6 records as their own hosts: in most cases, mixed results would result from the same hosts with both IPv4 and IPv6 endpoints, and thus contain duplicate hosts. Resolvers should return not just addresses, but hosts, which contain possibly multiple addresses.
DNS itself does not contain enough information to know which addresses belong to the same hosts, so this proposal suggests a simple algorithm that arbitrarily pairs IPv6 addresses with IPv4 addresses in the order they appear in the result set. In subsequent resolutions, the mappings will be kept consistent as long as both the IPv6 and IPv4 address are still present in the new results.
Since a host could have multiple addresses, the current approach to mapping to a specific target will not work. This gives us an opportunity to remove some hacks by instead implementing the override on the Dial level. There are two potential approaches:
Implement an RFC 6555-style Dial race using the Dial function or default dialer directly. This would not require changes to the API, but it would require implementing the nuances of the dial race by hand.
Refactor the custom Dial function API so that we can pass a custom *net.Resolver to be used for dialing. Allow the real hostname of the target to pass down to the Dial function. Implement custom *net.Resolver behavior so that we can return only the records appropriate for the individual host entry. This would allow the default RFC 6555 dial race implemented in the Go net package to do most of the work.
Overview
In dual-stack IPv4+IPv6 environments, it is not always possible to know with certainty whether or not an IPv6 address is actually routable from the current environment. Especially in containerized environments, oftentimes “dual stack” IPv4+IPv6 is actually an IPv4 stack with IPv6 loopback, or a similar configuration where Internet IPv6 addresses do not route properly. Therefore, so-called “Happy Eyeballs” algorithms, as proposed by RFC 6555, are employed in any situation where dual-stack IPv4+IPv6 networking is encountered to ensure a swift fallback when IPv6 networking is non-functional.
In Go’s
net
stack, this is implemented at thenet.Dialer
level: when resolving addresses, thenet.Dialer
will choose IPv6 addresses in preference as “primary” addresses, and IPv4 addresses as “fallback” addresses. Thenet.Dialer
performs a race, first attempting to connect to the IPv6 address, and shortly after, trying to connect to the IPv4 address, concurrently. These connection attempts are raced, with the first connection that succeeds getting used and the other being closed and discarded.In httplb, this doesn’t work: httplb creates individual connections to hosts, resolving names before the
net.Dialer
is called. This means thenet.Dialer
cannot perform its RFC 6555 race.Proposal
This proposal asserts that the core problem with
httplb
today is that theResolver
returns unstructured addresses. This is appropriate for DNS, because DNS does not carry any information about hosts: it only has records that point to addresses. However, the reason why a clean IPv6 to IPv4 fallback can not be implemented is because theResolver
does not return enough information. In an ideal world, theResolver
would return both the IPv4 and IPv6 address(es) for a host as a single unit.The current default behavior of
httplb
atHEAD
is to prefer IPv4 addresses when present, only using IPv6 addresses when there are no IPv4 addresses present. This is probably widely compatible with existing deployments, and it is likely better than using both resolved IPv4 and IPv6 addresses, since it is very likely that some of the addresses point to the same hosts, and thus in dual stack environments it is very likely that using both IPv4 and IPv6 addresses would lead to improperly balanced load.Resolver Interface
In order to rectify this, this proposal suggests that the
Resolver
return a slice of Host structures, which shall have:For more advanced resolvers, these values may be able to be filled in some logical fashion that is able to support a proper RFC 6555 fallback. DNS however does not provide enough information, so any solution to this problem will carry at least some downsides.
DNS Resolver implementation
There is no ideal way to implement this with DNS, so a heuristic must be used. I suggest the following:
The net effect of this is that the fallback IPv4 address for a given IPv6 is effectively arbitrary; this means that if any of the hosts are unhealthy, the balancing may become uneven when IPv4 fallback addresses are used and result in multiple entries in the pool balancing to the same host. Unfortunately, there's really no way to prevent this from happening with DNS alone, but I believe this is a better overall outcome, as the result today is that not explicitly specifying IPv4 or IPv6 results in other suboptimal behavior, potentially making adoption of httplb difficult in systems that might need to tolerate a large variety of possible production environments.
Transport implementation
Right now the way that the transport implementation handles targeting is by overriding the host in the URL, which requires some overhead:
http.Request
needs to be at least shallow copied.In order to allow for this fallback behavior, we need to move this override to a lower level. This gives us the opportunity to lower the overhead of the transport implementation in the vast majority of cases, since there are far fewer cases where the request or URL will need to be cloned, and the TLS configuration will never need to be patched.
Implement RFC 6555 fallback manually
It is possible for users to provide a custom
Dial
function. We could wrap this again into a custom dial function that performs the RFC 6555Dial
race using the underlying implementation. The downside here is that we need to implement this race ourselves, though it is not insurmountable.Implement a custom
*net.Resolver
It is challenging but possible to override the behavior of
*net.Resolver
. This can be done by setting thePreferGo
field totrue
and setting theDial
function to return an in-memorynet.Pipe()
that speaks DNS, ideally using thex/net/dns/dnsmessage
package. (I recently did this in my test implementation.)While this looks ugly, it seems like it is actually intended by the Go developers, and despite the text on
PreferGo
being somewhat unclear, it will in fact work on all platforms:Furthermore, while this is seemingly intended to work for the foreseeable future, there also seems to be intent to implement this more properly in the future:
By doing this, we net the ability to tell
net.Dial
about the list of hosts and it should be able to perform graceful IPv6 fallback, probably avoiding the fallback delay as necessary.The only problem with this approach is that it relies on being able to override the
Resolver
field of thenet.Dialer
, which precludes the ability to specify aDial
function. We would need to refactor this API so that the*net.Resolver
gets passed back to the user so they can use it in theirDial
function implementation (we may need to rehaul the way the override works; it should probably be a function that returns aDial
function, given a*net.Resolver
.)The advantage of this approach is that if applied properly, it should give us fallback behavior that is very close to what Go is able to offer out-of-the-box.
Summary
Dial
level. There are two potential approaches:Dial
race using theDial
function or default dialer directly. This would not require changes to the API, but it would require implementing the nuances of the dial race by hand.Dial
function API so that we can pass a custom*net.Resolver
to be used for dialing. Allow the real hostname of the target to pass down to theDial
function. Implement custom*net.Resolver
behavior so that we can return only the records appropriate for the individual host entry. This would allow the default RFC 6555 dial race implemented in the Gonet
package to do most of the work.