GoogleCloudPlatform / guest-configs

Apache License 2.0
31 stars 40 forks source link

Investigate behavior of self-resolution and distributed services #29

Open hopkiw opened 2 years ago

hopkiw commented 2 years ago

26 impacted some distributed software systems in an unexpected way. This issue is to investigate so as to understand the impact and avoid it in the future.

hopkiw commented 2 years ago

Background details

for historical reasons, the dhclient exit hook we provide in GCE images sets the IP of the interface being configured in /etc/hosts. So if you have a host called 'debian' in GCP project 'hopkiw-testing' with an internal IP 1.2.3.4, the entry would be:

1.2.3.4 debian.c.hopkiw-testing.internal debian

Whereas it is more common to add an entry associated with the loopback address, either 127.0.1.1 or 127.0.0.2, as in:

127.0.0.2 debian.c.hopkiw-testing.internal debian

Why add an entry at all? Primarily it enables you to locally determine your fully-qualified domain name. The FQDN is not a Linux concept, but a DNS concept. However, adding such an entry to /etc/hosts allows you to self-resolve your own FQDN without reaching to external DNS. This matters especially for GCE as the custom hostname feature does not update the GCE DNS entries. A VM named debian with custom hostname custom.example.com will still have an entry debian.c.hopkiw-testing.internal in the GCE DNS.

Also worth noting that a VM with multiple network interfaces would have the IP of the last interface configured in the /etc/hosts entry, as the dhclient exit hook is called sequentially after configuring each interface. This is neither wrong or right, as Linux does not have the concept of a primary network interface, it simply has a set of interfaces each with a set of addresses. It's a user choice which interface or address to use.

revident commented 2 years ago

The curx of them problem is choosing to set a line in /etc/hosts.

In a typical linux system, the order of precedence for name look up is defined in /etc/nsswitch.conf, which in practice is rarely ever tweaked by anyone but the distribution maintainers. Sane and normal configs put 'files' first, meaning /etc/hosts is the highest truth for a hostname to ip resolution. It trumps all.

grep '^hosts:' /etc/nsswitch.conf
hosts:      files mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] myhostname dns

Under the old behaviour of your scriptlet, your choosing to set a ip that matched that of the primary network interface, which means that more often then not, your resolving to something 'correct', in so much as that it 'works' more often then not. But your still forcing a resolution for that name in the loudest place.

When your changed to hard-coded 127.0.0.2, you now are resolving a all lookups of the system hostname to a localhost address. Which as the breakage reports show is 'incorrect', in that it doesn't work in many more cases.

Now for a bit of history. Linux Distributions like Fedora and RHEL, a decade ago, tried to be friendly and do what your doing, appending the hostname to the 127.0.0.1 default lines. This cause the same kind of unexpected issues and situations of 'why is my machine lying to me about this hostname to ip resolution'.

Unless you have a strong justification for it, I'd advise against setting any line at all.

revident commented 2 years ago

To provide and example for our environment at Achievers. We use Percona Tungsten to do mysql replication and cluster management of our mysql databases. The services are only configured to listen on the network ip of the machine, they do not listen on localhost addresses. The configs are written against hostnames, as those are predictable and avoid a bootstrapping paradox, as we tear down and rebuild nodes at will. When the cluster manager comes online for a given host, it checks that it can talk to it's peers and it's own services. The 127.0.0.2 hard code means that this check now fails, because there is no service listening at that address.

So when our hosts automatically rebooted from a kernel upgrade, they failed to rejoin their cluster, hanging the patching job and requiring manual intervention and re-writing of the /etc/hosts file.

donjaime commented 2 years ago

26 meant that DNS resolution of your VM hostname, when done on the same machine, would resolve to the hardcoded loopback IP (127.0.0.2) and not your Google internal VM IP.

To list a few services we observed break with #26

Zookeeper Multi node Zookeeper clusters are defined by enumerating all the nodes in the cluster in a Zookeeper config including what port to talk to. This includes "yourself" and the canonical way to do this is to specify hostnames. You usually use the same configuration file on all nodes in the cluster. You don't typically special case "yourself" by specifying 0.0.0.0:<port>. As such the standard way to setup zookeeper clusters meant that with the change in #26, zookeepers all would bind their port to the loopback IP and were therefore unreachable by other hosts in the cluster.

Solr Cloud Each Solr node in a Solr Cloud cluster connects to Zookeeper on boot and publishes it's own IP address and port to zookeeper so other hosts can discover it (service discovery). By default, with #26 applied, each Solr host would end up publishing the loopback IP and not its actual Google internal VM IP. Confusing any other Solr machine in the cluster trying to discover the cluster topology.


And based on other commenters on #26, I believe HDFS also suffers the same fate as SolrCloud where the NameNode is used for datanode lookups and block mappings. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes

I strongly suspect systems like Elastic Search would break by default too.

chrisboulton commented 2 years ago

I can confirm ElasticSearch is/was impacted by this change - that's specifically where we noticed the issue in our environment. Similar deal to Solr Cloud here, from what we can tell. Essentially nodes refused to join an existing cluster, refuting with an error message as such:

[2021-09-16T13:24:54,079][INFO ][o.e.d.z.ZenDiscovery     ]
[es-data-node.c.gcp-project.internal] failed to send join request to master [{es-master.c.gcp-project.internal}
{WS53HhWuRUGs5EajVOfz0A}{2eU2nMSoTVirjAl1qZhAPg}{es-master.c.gcp-project.internal}{10.x.x.x:9300}{es_role=master, zone=us-central1-b}],
reason [RemoteTransportException[[es-master.c.gcp-project.internal][10.x.x.x:9300][internal:discovery/zen/join]];
nested: ConnectTransportException[[es-data-node.c.gcp-project.internal][127.0.0.2:9300] handshake failed.
unexpected remote node {es-master.c.gcp-project.internal}{WS53HhWuRUGs5EajVOfz0A}{2eU2nMSoTVirjAl1qZhAPg}{es-master.c.gcp-project.internal}{10.x.x.x:9300}{es_role=master, zone=us-central1-b}]; ]