cloudfoundry / bosh-dns-release

BOSH DNS release
Apache License 2.0
18 stars 36 forks source link

bosh-dns noble support #99

Open jpalermo opened 2 months ago

jpalermo commented 2 months ago

Noble switching from resolvconf to systemd-resolved poses a problem for bosh-dns.

Currently bosh-dns injects itself at the top of /etc/resolv.conf via the resolvconf tooling. Bosh-dns then reads the other entries in /etc/resolv.conf and stores those as upstream recursors. Since bosh-dns is at the top of /etc/reslv.conf, it will always be queried first and it can then pass the query onto the recursors and has quite a few features to control this behavior.

With systemd-resolved, the system resolver is doing much the same as bosh-dns was doing. This poses a couple problems.

How do we inject bosh-dns into the configuration: We have several options here.

What do we do with the current recursor features of bosh-dns These options get limited by the above choice

klakin-pivotal commented 2 months ago

You may already have investigated this, and/or have considered it and disregarded it as being bad. Apologies for the noise if so.

Given that the systemd-resolved APIs don't really do what we want, might it be worth trying to set [Resolve] ... DNSStubListener=no (and maybe also LLMNR=no and MulticastDNS=no) in resolvd.conf https://www.freedesktop.org/software/systemd/man/latest/resolved.conf.html#Options, and using the resolvconf-compatibility mode of resolvectl to manage resolv.conf https://www.freedesktop.org/software/systemd/man/latest/resolvectl.html#Compatibility%20with%0A%20%20%20%20resolvconf(8)?

jpalermo commented 2 months ago

Does that avoid the problems in the first option above?

...but poses a problem because systemd-resolved has a system bus API for resolving queries that bypasses /etc/resolv.conf, so as more tools switch to that, bosh-dns is left out of the loop

As more things start ignoring resolv.conf and start using the systemd-resolved system bus APIs, us continuing to configure bosh-dns to only use resolv.conf becomes a bigger problem.

klakin-pivotal commented 2 months ago

Does that avoid the problems in the first option above?

Nope. That first option actually totally precludes the first half of the thing I suggested. It's amazing that there's a documented way to shut down everything but the dbus query interface. Sorry for the noise.

jpalermo commented 2 months ago

Write the bosh-dns IP to a new file in /etc/systemd/resolved.conf.d/ where the bosh-agent writes the DNS servers it was given. This is most simple, but also has the downside that we have no control over how other resolvers are used by systemd-resolved

I believe that this does not work. Yes, it's possible to use this to add a new "global" dns server to systemd-resolved, but my understanding of what systemd-resolved would do with that was incorrect.

I was able to change bosh-dns to disable all recursing and to add a reference to itself in /etc/systemd/resolved.conf.d/ rather than using resolvconf, but once it was added systemd-resolved did not do what was desired. It was "possible" that it would resolve a query for a bosh dns address, but it was also just as likely to return an NXDOMAIN response.

The documentation for systemd-resolved mentions that it calls all servers in parallel looking for a response, and since NXDOMAIN is a valid response, if that comes back, that is the returned response.

My next attempt was to modify that congiuration file by specifying a route-only domains section in the configuration file marking the bosh-dns server as only valid for those particular domains, which systemd-resolved will then take queries that match the domains, and send them to this server.

This also does not work. It seems that systemd-resolved has a focus more on interfaces than on dns servers. Since I was adding the Domains section to a global DNS configuration, it is simply ignored. To use Domain specific DNS resolution, you must configure the network interface with the Domain, not the global configuration.

The 169.254.0.2/32 address used by bosh-dns is not an interface, but a second IP on the loopback interface. It may be possible to add the Domains= section to the loopback device while associating the 169.254.0.2 DNS server with that interface and it will simply "work", but generally systemd-resolved does not treat the loopback device as a configured interface.

It seems like the dbus API is the most practical way to configure the loopback interface DNS, but I haven't been able to figure that out yet.

jpalermo commented 2 months ago

New findings.

I was wrong before about how it operates, and some of the docs seem to be wrong too.

systemd-resolved only has a single active DNS server at any one time for each interface and for the "global" state. It assumes additional servers on the same interface all behave the same, so it doesn't query them in parallel. The "current servers" for global state and each interface are all queried in parallel by systemd-resolved.

Current the bosh-agent configures a global server. For bosh-dns to work, we'd need it to be the only global server or the only server on a particular interface. Both resolvectl and the dbus API refuse to configure dns servers on the loopback interface.

So one possible scenario would be to have bosh-dns configure itself as the single global server and have the bosh-agent instead of using the global space, place the provided dns configuration directly on the other interfaces (normally just eth0 I'm guessing)

jpalermo commented 2 months ago

My testing was done with GCP where we surprisingly use DHCP for network configuration. This seems to be so the interface is able to discover the GCP provided DNS servers.

However, systemd-resolved doesn't seem to have any way to configure "additional" DNS servers for an interface. We could always have the agent wait for the networking to come online and then modify the DNS servers for the interface manually, but that seems like an awkward interaction.

Something I haven't yet tested is if the agent were to configure the interface directly, rather than using the config files, if that configuration will "stick" once DHCP updates the DNS servers for the interface.

If that doesn't work, ripping out systemd-resolved is looking like our best option.

Another option that would work is putting bosh-dns directly on the stemcell so the agent can always configure it as the single global resolver and the agent can then configure bosh-dns resolvers with the settings.json provided dns servers.

bosh-dns on the stemcell does seem very sane at this point, but also seems like a lot of work to get it there and also provide a way for the config to be updated later with additional configuration.

jpalermo commented 2 months ago

With some help from @cunnie , we managed to get it working.

Currently, bosh-dns creates an additional ip on the loopback device. We changed the behavior so under noble it instead creates a new virtual interface of type dummy and binds the 169.254.0.2 address to that interface instead of the loopback interface.

This allows us to then have systemd-resolved pick up the bosh-dns DNS server IP address from this interface and use it for resolving queries without worrying which DNS server is the "current" one.

We also did work to populate the "Domain" configuration. Since systemd-resolved queries DNS servers for all interfaces by default, and since bosh-dns always uses a TTL of 0, bosh-dns records never get cached by systemd-resolved. This means the external DNS servers would get copies of all the bosh-dns queries which could put them under unexpected additional load. By populating the "Domain" configuration for the virtual interface with both the bosh-dns domains as well as any alias domains found on the system, it allows systemd-resolved to only send those queries to bosh-dns and not to any of the external dns servers.

image

ramonskie commented 1 month ago

a pr is under review https://github.com/cloudfoundry/bosh-dns-release/pull/100

max-soe commented 3 weeks ago

As we discussed yesterday in the WG Meeting we should start with a list of bosh-dns features that we lose with this new implementation. And also find workarounds/solutions to achieve the same with Noble. I start a list here, we will sync in the next days internally to maybe find more.. If you have any other features in mind, feel free to add them:

beyhan commented 3 weeks ago

@max-soe I think you forgot to add the link.

I can think of:

ramonskie commented 3 weeks ago

systemd-resolved:

logging can be enabled with resolvectl log-level debug as we already do some resolvctl in bosh-dns we should be able to transfer this cmd a different option would be that bosh-dns would be to set resolved config

ramonskie commented 2 weeks ago

prometheus has the ability to pull up systemd-resolved https://github.com/prometheus-community/systemd_exporter/pull/119