canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.87k stars 856 forks source link

bionic: static maas missing search domain in systemd-resolve configuration #3186

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1771885

Launchpad details
affected_projects = ['juju', 'juju/2.3', 'maas']
assignee = None
assignee_name = None
date_closed = 2018-06-06T00:26:51.966062+00:00
date_created = 2018-05-17T20:08:00.161839+00:00
date_fix_committed = None
date_fix_released = None
id = 1771885
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1771885
milestone = None
owner = thedac
owner_name = David Ames
private = False
status = wont_fix
submitter = admcleod
submitter_name = Andrew McLeod
tags = ['bionic', 'network']
duplicates = []

Launchpad user Andrew McLeod(admcleod) wrote on 2018-05-17T20:08:00.161839+00:00

juju: 2.4-beta2
MAAS: 2.3.0

Testing deployment of LXD containers on bionic (specifically for an openstack deployment) lead to this problem:

https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1765405

Summary:

previously, the DNS config in the LXD containers were the same as the host machines

now, the DNS config is in systemd, the DNS server is set correctly, but the search domain is missing, so hostnames won't resolve.

Working resolv.conf on xenial lxd container:

nameserver 10.245.168.6 search maas

Non-working "systemd-resolve --status":

... Link 21 (eth0) Current Scopes: DNS LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no DNS Servers: 10.245.168.6

Working (now able to resolve hostnames after modifying netplan and adding search domain):

Link 21 (eth0) Current Scopes: DNS LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no DNS Servers: 10.245.168.6 DNS Domain: maas

ubuntu@juju-6406ff-2-lxd-2:/etc$ host node-name node-name.maas has address 10.245.168.0

ubuntu-server-builder commented 1 year ago

Launchpad user Frode Nordahl(fnordahl) wrote on 2018-05-17T20:23:43.175358+00:00

A interesting twist on this is that juju seems to do the right thing when host system is xenial and container is bionic (See below).

It may be that this is a generic issue at some level on Ubuntu after move to systemd-resolve. Other interesting bugs I have found on the subject: https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1684854 https://github.com/systemd/systemd/issues/6572

Excerpt of test displaying this working for juju deployed bionic container on xenial host system (all hosts are in the .maas domain and pinging by just using hostname part works. Repeating this test with Bionic as host system will fail):

$ juju status Model Controller Cloud/Region Version SLA default maas maas 2.4-rc1 unsupported

App Version Status Scale Charm Store Rev OS Notes

Unit Workload Agent Machine Public address Ports Message

Machine State DNS Inst id Series AZ Message 0 started 172.16.122.251 qkm377 xenial default Deployed 0/lxd/0 started 172.16.122.253 juju-4d3dd7-0-lxd-0 xenial default Container started 0/lxd/1 started 172.16.122.252 juju-4d3dd7-0-lxd-1 bionic default Container started

Controller Timestamp 15 May 2018 15:23:46+02:00

$ juju ssh 0 'lsb_release -a &&ping -c 1 awake-yak' No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial PING awake-yak.maas (172.16.122.250) 56(84) bytes of data. 64 bytes from awake-yak.maas (172.16.122.250): icmp_seq=1 ttl=64 time=0.319 ms

--- awake-yak.maas ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.319/0.319/0.319/0.000 ms Connection to 172.16.122.251 closed.

$ juju ssh 0/lxd/0 'lsb_release -a &&ping -c 1 awake-yak' No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial PING awake-yak.maas (172.16.122.250) 56(84) bytes of data. 64 bytes from awake-yak.maas (172.16.122.250): icmp_seq=1 ttl=64 time=0.205 ms

--- awake-yak.maas ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.205/0.205/0.205/0.000 ms Connection to 172.16.122.253 closed.

$ juju ssh 0/lxd/1 'lsb_release -a &&ping -c 1 awake-yak' No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04 LTS Release: 18.04 Codename: bionic PING awake-yak.maas (172.16.122.250) 56(84) bytes of data. 64 bytes from awake-yak.maas (172.16.122.250): icmp_seq=1 ttl=64 time=0.116 ms

--- awake-yak.maas ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.116/0.116/0.116/0.000 ms Connection to 172.16.122.252 closed.

ubuntu-server-builder commented 1 year ago

Launchpad user John A Meinel(jameinel) wrote on 2018-05-18T04:17:07.591318+00:00

I know Bionic changed how we find the nameserver, is the issue that we aren't looking in the new place for the search domain or is it that the information isn't there?

ubuntu-server-builder commented 1 year ago

Launchpad user Eric Claude Jones(ecjones) wrote on 2018-05-19T20:37:10.290246+00:00

"When MAAS deploys a node it will configure its resolver accordingly; in the case where the above settings are made and you are deploying Linux:

The servers' resolv.conf file will have the IP address for the MAAS Region controller
The BIND installation on the MAAS Region controller will have a forwarders entry set up for the addresses you provide.

The effect is that queries on the node will be sent to MAAS, which will resolve directly for queries in domains which it manages (by default, *.maas), and forward requests for everything else to the forwarder." - (https://askubuntu.com/questions/820925/how-do-i-set-a-dns-server-in-maas-that-will-be-passed-on-to-the-nodes)

When a container is created our MAAS provider code does not populate the container's DSN search domain by directly asking MAAS for that information. AFAIK, the provisioner code has a fallback heuristic that says "If I didn't get my DNS information from the provider then find it in the host's configuration."

On Xenial hosts, scraping the host for this information worked fine since MAAS plugs resolve.conf with the needed information. On Bionic hosts, something else is happening.

In short, maybe we should be getting this information directly from MAAS instead of getting it from what MAAS told the host machine.

In our case the netplan.yaml gets populated with information from a MAAS device (among other things). It might be sufficient to populate the search domain with the domain of the MASS device's FQDN which by default should be "maas". Another solution could be to ask MAAS in a more direct fashion (i.e GET /api/2.0/domains/).

ubuntu-server-builder commented 1 year ago

Launchpad user Eric Claude Jones(ecjones) wrote on 2018-05-20T16:19:37.631563+00:00

https://github.com/juju/gomaasapi/pull/74/files

ubuntu-server-builder commented 1 year ago

Launchpad user Eric Claude Jones(ecjones) wrote on 2018-05-20T20:40:29.826207+00:00

https://github.com/juju/juju/pull/8731

ubuntu-server-builder commented 1 year ago

Launchpad user Richard Harding(rharding) wrote on 2018-05-22T15:30:01.192637+00:00

https://github.com/juju/juju/pull/8741

ubuntu-server-builder commented 1 year ago

Launchpad user Alex Kavanagh(ajkavanagh) wrote on 2018-05-23T15:15:09.687410+00:00

Sadly, I've still got this bug; just tested with 2.4-beta3+develop-e33ec12 (which may not be yet include this??)

The resolve.conf (generated) in the unit/container still has no search domain:

This file is managed by man:systemd-resolved(8). Do not edit.

#

This is a dynamic resolv.conf file for connecting local clients to the

internal DNS stub resolver of systemd-resolved. This file lists all

configured search domains.

#

Run "systemd-resolve --status" to see details about the uplink DNS servers

currently in use.

#

Third party programs must not access this file directly, but only through the

symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,

replace this symlink by a static file or a different symlink.

#

See man:systemd-resolved.service(8) for details about the supported modes of

operation for /etc/resolv.conf.

nameserver 127.0.0.53

ubuntu-server-builder commented 1 year ago

Launchpad user John A Meinel(jameinel) wrote on 2018-05-23T19:06:41.071223+00:00

Can you include the output of /etc/netplan/*.yaml ?

A possibility is that we're still running into some sort of missing information and we did try to set the value, but we're overriding it: if !haveNameservers || !haveSearchDomains { logger.Warningf("incomplete DNS config found, discovering host's DNS config") dnsConfig, err := findDNSServerConfig() if err != nil { return nil, errors.Trace(err) }

            // Since the result is sorted, the first entry is the primary NIC. Also,
            // results always contains at least one element.
            results[0].DNSServers = dnsConfig.Nameservers
            results[0].DNSSearchDomains = dnsConfig.SearchDomains
            logger.Debugf(
                    "setting DNS servers %+v and domains %+v on container interface %q",
                    results[0].DNSServers, results[0].DNSSearchDomains, results[0].InterfaceName,
            )
    }

If you try: "juju debug-log --replay --include-module juju.provisioner" "juju debug-log -m controller --replay --include-module juju.provisioner"

Do you see the line about 'incomplete DNS config found' ?

Can you also confirm what is on the host machine's /etc/resolve.conf? I would expect nameserver 127.0.0.53, but I'm not sure if systemd-resolved includes search path there, or only in its hidden reserved resolve information.

ubuntu-server-builder commented 1 year ago

Launchpad user John A Meinel(jameinel) wrote on 2018-05-24T13:33:40.670085+00:00

@Alex, can you check that "host foo" still doesn't resolve? It may be that we don't need the search path in /etc/resolve.conf because of the new systemd-resolved changes. Our testing showed that 'host nuc-1' from inside the container did indeed resolve.

ubuntu-server-builder commented 1 year ago

Launchpad user Alex Kavanagh(ajkavanagh) wrote on 2018-05-24T14:50:45.851607+00:00

Sure, I'll run up the system and leave it running; it'll be on ruxton dells; I'll report back here when it is up.

ubuntu-server-builder commented 1 year ago

Launchpad user Alex Kavanagh(ajkavanagh) wrote on 2018-05-24T16:34:10.711845+00:00

So here is the information that hopefully will be useful.

  1. /etc/netplan/99-juju.yaml on the container:

network: version: 2 ethernets: eth0: match: macaddress: 00:16:3e:d0:b4:4e addresses:

  1. host machine's /etc/resolve.conf:

This file is managed by man:systemd-resolved(8). Do not edit.

# ... #

See man:systemd-resolved.service(8) for details about the supported modes of

operation for /etc/resolv.conf.

  1. nameserver 127.0.0.53

  2. juju debug-log --replay --include-module juju.provisioner machine-1: 15:34:02 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:34:30 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:34:39 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:34:44 WARNING juju.provisioner failed to start machine 2/lxd/2 (failed to ensure LXD image: Failed remote image download: UNIQUE constraint failed: images_aliases.name), retrying in 10s (10 more attempts) machine-2: 15:34:57 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:35:08 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:35:16 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:35:22 WARNING juju.provisioner failed to start machine 0/lxd/2 (failed to ensure LXD image: Failed remote image download: UNIQUE constraint failed: images_aliases.name), retrying in 10s (10 more attempts) machine-0: 15:35:37 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-3: 15:35:52 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:36:41 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-1: 15:39:26 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-3: 15:40:57 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-3: 15:41:22 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:42:53 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config

So, yes, lots of "incomplete DNS config found" lines (I think one per container).

  1. juju debug-log -m controller --replay --include-module juju.provisioner machine-0: 15:12:49 INFO juju.provisioner provisioner-harvest-mode is set to destroyed; unknown instances not stopped []

  2. juju --version 2.4-beta3-xenial-amd64 (actually from snap: 2.4-beta3+develop-c17354d

  3. The container and host does resolve; it's just that there is no search domain so nova-cloud-controller fails: See https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1765405

Add a search domain (maas) to the netplan config and applying it then allows nova-cloud-controller to work.

--

I've left the system up; if there are any more details you would like then please let me know.

ubuntu-server-builder commented 1 year ago

Launchpad user John A Meinel(jameinel) wrote on 2018-05-25T07:38:18+00:00

It sounds like we are failing to find the search domain and add it to the configuration.

John =:->

On Thu, May 24, 2018, 20:40 Alex Kavanagh 1771885@bugs.launchpad.net wrote:

So here is the information that hopefully will be useful.

  1. /etc/netplan/99-juju.yaml on the container:

network: version: 2 ethernets: eth0: match: macaddress: 00:16:3e:d0:b4:4e addresses:

  • 10.245.168.48/21 gateway4: 10.245.168.1 nameservers: addresses: [10.245.168.6]
  1. host machine's /etc/resolve.conf:

This file is managed by man:systemd-resolved(8). Do not edit.

# ... #

See man:systemd-resolved.service(8) for details about the supported

modes of

operation for /etc/resolv.conf.

  1. nameserver 127.0.0.53

  2. juju debug-log --replay --include-module juju.provisioner machine-1: 15:34:02 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:34:30 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:34:39 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:34:44 WARNING juju.provisioner failed to start machine 2/lxd/2 (failed to ensure LXD image: Failed remote image download: UNIQUE constraint failed: images_aliases.name), retrying in 10s (10 more attempts) machine-2: 15:34:57 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:35:08 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:35:16 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:35:22 WARNING juju.provisioner failed to start machine 0/lxd/2 (failed to ensure LXD image: Failed remote image download: UNIQUE constraint failed: images_aliases.name), retrying in 10s (10 more attempts) machine-0: 15:35:37 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-3: 15:35:52 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-0: 15:36:41 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-1: 15:39:26 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-3: 15:40:57 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-3: 15:41:22 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config machine-2: 15:42:53 WARNING juju.provisioner incomplete DNS config found, discovering host's DNS config

So, yes, lots of "incomplete DNS config found" lines (I think one per container).

  1. juju debug-log -m controller --replay --include-module juju.provisioner machine-0: 15:12:49 INFO juju.provisioner provisioner-harvest-mode is set to destroyed; unknown instances not stopped []

  2. juju --version 2.4-beta3-xenial-amd64 (actually from snap: 2.4-beta3+develop-c17354d

  3. The container and host does resolve; it's just that there is no search domain so nova-cloud-controller fails: See https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1765405

Add a search domain (maas) to the netplan config and applying it then allows nova-cloud-controller to work.

--

I've left the system up; if there are any more details you would like then please let me know.

-- You received this bug notification because you are subscribed to juju. Matching subscriptions: juju bugs https://bugs.launchpad.net/bugs/1771885

Title: bionic: lxd containers missing search domain in systemd-resolve configuration

To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1771885/+subscriptions

ubuntu-server-builder commented 1 year ago

Launchpad user Eric Claude Jones(ecjones) wrote on 2018-05-25T09:29:41.033579+00:00

If at all helpful, it should be known that the patches above only apply to newly deployed containers. The patches above do not repair or correct existing containers.

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-05-30T22:25:58.800239+00:00

Discussion with roaksoax about this and it seems likely this is a cloud-int / netplan problem. I have added cloud-init and maas just to be thorough.

When bionic is deployed using MAAS 2.3.0 using a static network config the DNS search domain is missing from the netplan configuration and or systemd-resolve.

I am attaching three sets of data. bionic-maas, bionic-dhcp and xenial-maas to show the differences.

Cloud init reports in cloud-init.log it has the information. See search bellow:

config= { 'config': [{'id': 'eno1', 'mac_address': 'd4:be:d9:a8:44:ff', 'mtu': 1500, 'name': 'eno1', 'subnets': [{'address': '10.245.168.26/21', 'dns_nameservers': ['10.245.168.6'], 'gateway': '10.245.168.1', 'type': 'static'}], 'type': 'physical'}, {'id': 'eno2', 'mac_address': 'd4:be:d9:a8:45:01', 'mtu': 1500, 'name': 'eno2', 'subnets': [{'type': 'manual'}], 'type': 'physical'}, {'id': 'eno3', 'mac_address': 'd4:be:d9:a8:45:03', 'mtu': 1500, 'name': 'eno3', 'subnets': [{'type': 'manual'}], 'type': 'physical'}, {'id': 'eno4', 'mac_address': 'd4:be:d9:a8:45:05', 'mtu': 1500, 'name': 'eno4', 'subnets': [{'type': 'manual'}], 'type': 'physical'}, {'address': ['10.245.168.6'], 'search': ['maas'], 'type': 'nameserver'}], 'version': 1}

But the /etc/netplan/50-cloud-init.yaml configuration is missing this information leading to resolution failures.

By contrast the xenial (/etc/network/interfaces.d/50-cloud-init.conf has the correct information when the logging shows the same input from MAAS.

See also the bionci DHCP example which gets the search domain information from DCHP. Launchpad attachments: bionic-maas.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-05-30T22:26:33.411505+00:00

Xenial-maas info Launchpad attachments: xenial-maas.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-05-30T22:26:53.788292+00:00

bionic DHCP Launchpad attachments: bionic-dhcp.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Andres Rodriguez(andreserl) wrote on 2018-05-30T22:50:38.704257+00:00

Hi David,

It seems that the issue would be a duplicate of this 1, although it seems it may have been fixed as part of the 18.2 release, as per [2[.

David, can you please confirm your images are up to date and using the latest version of cloud-init. If they are, please re-open the bug report below.

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-05-31T14:53:44.872800+00:00

@Andres,

In the cloud-init.logs I provide it is version 18.2. I also refreshed the images just to be sure. Is there a way I can prove the images are up to date? Willing to do so.

Unlike the description in [1] we are not getting the search domain in either /etc/resolv.conf or in systemd-resolve --status. Please see the attached logs. Could this be a regression?

[1] https://bugs.launchpad.net/cloud-init/+bug/1750884

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-05-31T14:55:44.858342+00:00

Just to be 100% clear. We do get the nameserver setting, (see the attached logs) and resolution for FQDNs works. We do not get the search domain and therefore hostname only resolution does not work.

ubuntu-server-builder commented 1 year ago

Launchpad user Andres Rodriguez(andreserl) wrote on 2018-05-31T17:06:18.445623+00:00

@David, could you please attach the output of 'maas machine get-curtin-config '

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-05-31T17:19:24.939219+00:00

@Andres

$ maas ruxton machine get-curtin-config acq33q

apt: preserve_sources_list: false primary:

ubuntu-server-builder commented 1 year ago

Launchpad user David Ames(thedac) wrote on 2018-06-05T23:28:11.634763+00:00

Removing the DNS settings from the subnet in MAAS resolves the problem. Bionic then receives the correct search domain.

I could make the argument that having the DNS setting on the subnet should either allow you to set search domain or it should not exist. But from our point of view the bug is resolved.

Thanks for everyone's work on this.

ubuntu-server-builder commented 1 year ago

Launchpad user David Britton(dpb) wrote on 2018-06-06T00:29:07.502106+00:00

Given the workaround available for maas & cloud-init this is working as expected. Thanks for the debugging everyone.

ubuntu-server-builder commented 1 year ago

Launchpad user Neiloy Mukerjee(neiloy) wrote on 2018-06-13T20:31:19.449523+00:00

I can confirm both that this bug exists and that the referenced workaround deals with the issue.

Context: nova-cloud-controller deployment was producing hook failed: "cloud-compute-relation-changed" for nova-compute:cloud-compute

On the unit machine, the /etc/netplan/99-juju.yaml started as below: network: version: 2 ethernets: eth0: match: macaddress: 00:16:3e:9c:a1:5c addresses:

Adding a search domain under nameservers, as below: network: version: 2 ethernets: eth0: match: macaddress: 00:16:3e:9c:a1:5c addresses:

and then running a netplan apply allowed the deployment to continue as expected.

ubuntu-server-builder commented 1 year ago

Launchpad user Andres Rodriguez(andreserl) wrote on 2018-07-12T20:20:09.874537+00:00

I'm re-opening this task for MAAS, as a user has been able to reproduce this issue in a different context. While there's a work-around on comment #22, the situation is that even when the same network configuration sent for xenial and bionic deployments is the same, the configuration differs, due to how cloud-init handles netplan configuration.

More specifically, in Xenial, when MAAS sends DNS config for both "global" and per-interface/subnet configuration, the resulting config is that the machine will have an aggregation of the configuration for DNS.

However, when deploying Bionic, when MAAS sends the same exact configuration, cloud-init interpret's it different and only the network configuration of an interface is taken into consideration, while the global is ignored.

As such, this only becomes an issue when the user overrides the DNS on specific subnet, which results in the search domain not being considered from the global config.

As such, we will make an improvement in MAAS to ensure that the search domain is always included regardless.

ubuntu-server-builder commented 1 year ago

Launchpad user Mike Pontillo(mpontillo) wrote on 2018-07-12T22:04:03.096731+00:00

I've proposed a change in MAAS that will replicate the DNS search path configuration on a per-interface basis in the v1 network preseed YAML passed to cloud-init (if there is a DNS server on the interface; i.e. defined on the subnet in MAAS). Hopefully that will smooth things over for those who encounter this in the future.