canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 930 forks source link

OVN network custom DNS support #10322

Open StyXman opened 2 years ago

StyXman commented 2 years ago

We're migrating from isolated LXD hosts running groups of instances (our product's cluster, but I'll keep calling them group of instances to differentiate from an LXD cluster) to a cluster of LXD hosts. We create and destroy these groups as part of our CI infra, so they're short lived, and dozens to hundreds are created and destroyed every day.

We decided to use OVN for networking because FAN imposes too many restrictions on IPs, and other solutions required configuring physical switches to dynamically create VLANs, which is another can of worms we didn't want to open.

With LXD bridges we were able to do the following:

  1. Have a single LXD bridge network for provisioning all groups of instances and provide DNS and default gateway. This network is created once when the host is configured for serving LXD and never ever touched again.
  2. This meant that the host was able to reach the instances via SSH, and we use Ansible on top of that.
  3. It also meant that the instances could reach the wider internet through this network.
  4. We also create one or more LXD bridge networks per group, and one of them was running another dnsmasq instance serving 'private DNS' with several arbitrary DNS names pointing to the instances. For that we used the raw.dnsmasq option. This is because our product requires that the instances must have multiple network interfaces that are used for at least two logically different networks, backend and frontend. There's an extra interface connected to the LXD bridge from point 1.

As you can see in https://discuss.linuxcontainers.org/t/serving-dns-over-ovn-networks-and-accessing-the-instances-from-the-hosts/13900/46 I'm trying to figure out how to do points 1-3 without hacking workarounds to issues I might be introducing myself, but the DNS part will require some extra work to have dnsmasq in there.

If I manage to get points 1-3 natively (meaning no workarounds), the only missing piece is the DNS support. @tomponline suggested to replace this bridge with a small instance running dnsmasq, but I rather prefer for LXD to do something for me. Maybe it's just simpler to run dnsmasq in the same network namespace as the rest of the instances, but I still don't know how OVN connects to the real world.

tomponline commented 2 years ago

This is basically adding custom DNS record support, as well as the ability to specify custom DNS servers for DHCP/IPv6 RA.

tomponline commented 2 years ago
  1. Can be achieved by using a managed LXD bridge as an uplink for OVN networks. Then the OVN networks provide DHCP, DNS and default gateway services for the instances connected to that network.
  2. Can be achieved using either non-NAT OVN networks and static routes on the LXD host to allow direct network connectivity into the OVN network instances. Or potentially connecting the Ansible container instance to each of the OVN networks so Ansible can reach them directly.
  3. Is handled already because OVN by default SNATs to the OVN router's address on the uplink network.

Adding support for custom DNS records would avoid the need to run a custom dnsmasq instance.

tomponline commented 2 years ago

I wonder if we could also add custom DNS record support for bridge networks without having to resort to raw.dnsmasq

stgraber commented 2 years ago

I was thinking of that, but it would likely need to be tied to network zones in some way to not have too many different solutions.

StyXman commented 2 years ago

2. Or potentially connecting the Ansible container instance to each of the OVN networks so Ansible can reach them directly.

In our case, Ansible runs on the host, not in a container. Hence the need for the host to access the instances.