SovereignCloudStack / issues

This repository is used for issues that are cross-repository or not bound to a specific repository.
https://github.com/orgs/SovereignCloudStack/projects/6
2 stars 1 forks source link

IaaS DR/Standard for tenant VM DNS #229

Open garloff opened 1 year ago

garloff commented 1 year ago

As an SCS IaaS user, I want to have a defined way to get public DNS resolution.

TODO: Write ADR to standardize this.

Definition of Ready:

Definition of Done:

garloff commented 1 year ago

While it's easy to connect a VM to a router with an external network (see #167) and then configure the subnet to use well-known DNS servers such as the ones from google, cloudflare or quad9 to have name resolution, this has a few disadvantages:

garloff commented 1 year ago

Recommendation would be that users can leave the --dns-nameserer option out of the subnet creation call. In that case the expectation would be that freshly created VMs which get IPs via DHCP will also get a nameserver from DHCP that works. Working means:

markus-hentsch commented 8 months ago

DNS (internal and external) seems to be a complicated topic in OpenStack, having a lot of interdependencies with Neutron configuration and the existence of Designate. I dug a bit deeper in to the topic to get a grasp on the available mechanisms and their configuration. Although this issue seems to be primarily focused on the DNS provided to VMs as a means of external connectivity, I think the overall DNS behavior (including internal) is not unrelated to this and also very important considering the variety of possible cloud configurations. We might consider creating a separate issue and ADR regarding internal DNS and published Designate records but for now I'll document the results of my research below.

DNS integration (internal resolution)

Link: https://docs.openstack.org/neutron/2023.2/admin/config-dns-int.html

The Networking service enables users to control the name assigned to ports by the internal DNS.

Neutron configuration

/etc/neutron/neutron.conf

[default]
dns_domain = example.org.

(note: the dot at the end is not an error[^1])

This domain will be appended to the dns_name value users can set for ports.

Plugin

Two choices are offered for extensions_drivers in the ML2 plugin:

Example entry for /etc/neutron/plugins/ml2/ml2_conf.ini:

[ml2]
extension_drivers = ...,dns_domain_ports

Usage

Overriding DNS domain on networks

On a per-network basis, the dns_domain can be set via openstack network create --dns-domain ....

This overrides the dns_domain entry in Neutron's configuration and will be used for standalone ports as well as ports automatically created for VMs. This affects the results of both sections documented below.

DNS assignments for ports

A port can have a specific DNS name assigned to it via openstack port create --dns-name my-port-name ....

This will lead to a DNS assignment being created for the port:

| dns_assignment | fqdn='my-port-name.example.org.', hostname='my-port-name', ip_address='192.0.2.67' |

DNS assignments for VMs

If internal DNS integration is configured a VM will automatically get corresponding DNS names assigned to their ports based on the name specified with openstack server create ... my_server_name.

This will lead to a DNS assignment being created for the port of the VM based on its name in a DNS-sanitized format. See the following example output of openstack port show on the VM port:

| dns_assignment | fqdn='my-server-name.example.org.', hostname='my-server-name', ip_address='203.0.113.8' |
| dns_domain     | example.org.                                                                            |
| dns_name       | my-server-name                                                                          |

External DNS integration (via Designate)

Link: https://docs.openstack.org/neutron/2023.2/admin/config-dns-int-ext-serv.html#config-dns-int-ext-serv

Note: this requires the internal DNS functionality of Neutron to be enabled and configured as documented above.

Neutron configuration

/etc/neutron/neutron.conf

[default]
external_dns_driver = designate

[designate]
url = ...
...

The [designate] section has a lot of Designate-specific options^3 that must be set according to the Designate configuration.

Usage

Case 1: DNS record for floating IP based on network and VM port

Case 2: DNS record for floating IP based on floating IP settings

Case 3: DNS record for ports

Note: for all cases illustrated below, the corresponding zone matching the dns_domain attributes must be created in Designate beforehand.

3a: fixed IP from subnet

Note: the functionality of the subnet_dns_publish_fixed_ip extension might be superseded by the dns-integration-domain-keywords extension, which further enhances the behavior to have dynamic name templating options for dns_domain attributes^5.

3b: fixed IP within externally reachable network

3c: fixed IP within externally reachable network based on network domain

Note by @markus-hentsch:
I'm not entirely sure what they are referring to with "externally accessible" networks here. Judging from the example^4 it is not a genuine provider network with direct perimeter connection (i.e. router:external=External).


Designate quick guide with user-focused usage of above mechanisms

Link: https://docs.openstack.org/designate/2023.2/user/neutron-integration.html

[^1]: https://serverfault.com/a/803037, https://webmasters.stackexchange.com/a/73946

markus-hentsch commented 8 months ago

Based on the research above, I thought about what could be standardized by SCS in the context of DNS. I've split this into 3 topics where the first ("Forwarded DNS") is referring to the initial subject that @garloff mentioned.

I) Forwarded DNS:

II) Internal DNS records:

III) External DNS records:

Goals:

markus-hentsch commented 7 months ago

Note: during the discussions around https://github.com/SovereignCloudStack/standards/pull/522 we proposed to not enable DNS in external/provider networks due to the threat of DNS reflection attacks. We should mention this in the DNS standard.

markus-hentsch commented 7 months ago

Practical evaluation (WIP)

I now rebuilt my DevStack to support Designate and Neutron's internal DNS to start evaluating options for the standard.

I can confirm the correct working of internal DNS resolution as introduced above in the documentation research.

As soon as

extension_drivers = ...,dns_domain_ports

is added to the ML2 plugin configuration of Neutron, any newly created ports and servers will have the internal DNS resolution based on their name and the dns_domain option of neutron.conf unless overriden by user specified flags.

As an example, considering two Nova instances "server-1" and "server-2" in the same Neutron network, the first can instantly resolve server-2.openstackgate.local to the second instance's IP address through Neutron's internal DNS resolution (dns_domain = openstackgate.local is default in DevStack).

This makes the dns_domain_ports plugin very valuable (this is without involving Designate yet!) and should be made mandatory.

markus-hentsch commented 7 months ago

I was having a talk with Jonas Schäfer (C&H) where we tried to come up with basic guidelines for a standard concerning the various DNS plugins and settings in Neutron.

Neutron makes distinctions between settings, plugins and extensions. There are also nuances like extension drivers in plugins (such as ML2). However, not everything is configurable/optional in the same way as it turns out.

On the net there are a lot of statements like "if API extension XY is enabled ..." which suggest that a CSP can toggle API extensions in Neutron on and off.

However, after a bit of research I learned that this is actually not something the CSP can configure or, to be more precise, is not supposed to^1. Neutron API extensions being enabled or disabled seems to depend on the plugins / backends used and whether they implement a specific extension or not.

It seems to be non-trivial to figure out if a given extension (e.g. subnet_dns_publish_fixed_ip) is supported by a specific Neutron configuration as I have not been able to find some kind of support matrix like there is for Nova or Cinder backends in relation to features.

I need to look into this further but this will most likely limit what we can standardize or rely on to be available.

horazont commented 7 months ago

TL;DR: OpenStack injects DNS responses via OVN, which is a documented feature. Don't mind the hundreds of lines of C code directly exposed on port 53 then.


While understanding how local DNS works in OpenStack, we got curious because even with --dns-nameservers 9.9.9.9 set on a subnet, resolution of OpenStack local names still worked. Double-checking resolv.conf revealed that no other nameserver was injected.

Looking at tcpdump, something odd stood out:

$ sudo tcpdump -vvvnni eth0 port 53
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:42:14.153148 IP (tos 0x0, ttl 64, id 35577, offset 0, flags [DF], proto UDP (17), length 81)
    10.1.0.61.56891 > 9.9.9.9.53: [bad udp cksum 0x1c9e -> 0x1a53!] 32596+ AAAA? cirros-server-3.openstackgate.local. (53)
10:42:14.162337 IP (tos 0x0, ttl 57, id 43106, offset 0, flags [none], proto UDP (17), length 156)
    9.9.9.9.53 > 10.1.0.61.56891: [udp sum ok] 32596 NXDomain q: AAAA? cirros-server-3.openstackgate.local. 0/1/0 ns: . [53m24s] SOA a.root-servers.net. nstld.verisign-grs.com. 2024041900 1800 900 604800 86400 (128)
10:42:14.162408 IP (tos 0x0, ttl 64, id 48881, offset 0, flags [DF], proto UDP (17), length 101)
    10.1.0.61.52429 > 9.9.9.9.53: [bad udp cksum 0x1cb2 -> 0x8297!] 45630+ AAAA? cirros-server-3.openstackgate.local.openstackgate.local. (73)
10:42:14.169194 IP (tos 0x0, ttl 57, id 6225, offset 0, flags [none], proto UDP (17), length 176)
    9.9.9.9.53 > 10.1.0.61.52429: [udp sum ok] 45630 NXDomain q: AAAA? cirros-server-3.openstackgate.local.openstackgate.local. 0/1/0 ns: . [59m9s] SOA a.root-servers.net. nstld.verisign-grs.com. 2024041900 1800 900 604800 86400 (148)
10:42:14.169290 IP (tos 0x0, ttl 64, id 31926, offset 0, flags [DF], proto UDP (17), length 81)
    10.1.0.61.40202 > 9.9.9.9.53: [bad udp cksum 0x1c9e -> 0xbc65!] 14707+ A? cirros-server-3.openstackgate.local. (53)
10:42:14.170213 IP (tos 0x0, ttl 64, id 31926, offset 0, flags [DF], proto UDP (17), length 132)
    9.9.9.9.53 > 10.1.0.61.40202: [no cksum] 14707- q: A? cirros-server-3.openstackgate.local. 1/0/0 cirros-server-3.openstackgate.local. [1h] A 10.1.0.56 (104)
10:42:14.172569 IP (tos 0x0, ttl 64, id 17795, offset 0, flags [DF], proto UDP (17), length 68)
    10.1.0.61.39649 > 9.9.9.9.53: [bad udp cksum 0x1c91 -> 0xa0eb!] 51333+ PTR? 56.0.1.10.in-addr.arpa. (40)
10:42:14.173893 IP (tos 0x0, ttl 64, id 17795, offset 0, flags [DF], proto UDP (17), length 139)
    9.9.9.9.53 > 10.1.0.61.39649: [no cksum] 51333- q: PTR? 56.0.1.10.in-addr.arpa. 1/0/0 56.0.1.10.in-addr.arpa. [1h] PTR cirros-server-3.openstackgate.local. (111)

As we can see, when quad9 replies with NXDOMAIN for names related to the cloud, the reply has a proper UDP checksum ([udp sum ok]) [^1], while the replies which contain the correct IP address do not have a UDP checksum ([no cksum]). That is suspicious, because it is not trivial to generate a checksum-less UDP packet on most OSes (and quad9 just runs standard DNS software AFAIK).

This checksum aspect is a hint at something fishy going on in the SDN of OpenStack itself, hence we dig into the generated OpenFlow rules on the hypervisor. We eventually[^2] find this rule (linebreaks added for readability):

 cookie=0x426810f8, duration=3208.259s, 
    table=29, n_packets=87, n_bytes=8223, 
    priority=100,udp,metadata=0x5,tp_dst=53 
    actions=controller(userdata=00.00.00.06.00.00.00.00.00.01.de.10.00.00.00.64,pause),resubmit(,30)

That rule matches on UDP packets to port 53[^3] and then sends them to the controller for further processing.

Initially, I thought that the controller was neutron-ovn-agent, but that was incorrect. After a lot of digging around the neutron code, I eventually found that neutron just inserts DNS records into the northdb, via:

  1. https://opendev.org/openstack/neutron/src/commit/49a25e7c0457e1d9082c5f22309bd2554e5d37bb/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py#L1233
  2. https://opendev.org/openstack/neutron/src/commit/49a25e7c0457e1d9082c5f22309bd2554e5d37bb/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py#L1244
  3. https://opendev.org/openstack/ovsdbapp/src/commit/b7ce3f9a7486f03a45d52ba03361592425ace444/ovsdbapp/schema/ovn_northbound/impl_idl.py#L346-L347
  4. https://opendev.org/openstack/ovsdbapp/src/commit/b7ce3f9a7486f03a45d52ba03361592425ace444/ovsdbapp/schema/ovn_northbound/commands.py#L1830-L1843

The actual DNS parsing and responding is handled by OVN. When the controller(..) action is invoked, the OVN controller kicks in and handles the packet. The code for that can be found here: https://github.com/ovn-org/ovn/blob/42ef6e3760cc5bd662e9370310e4ebeaaf948ac5/controller/pinctrl.c#L3264

This then, if the action in the userdata (the first four bytes) matches 0x6 (= ACTION_OPCODE_DNS_LOOKUP), calls into pinctrl_handle_dns_lookup: https://github.com/ovn-org/ovn/blob/42ef6e3760cc5bd662e9370310e4ebeaaf948ac5/controller/pinctrl.c#L2986-L3260

There, the DB lookup for matching DNS records happens and the forged response packet is constructed, if necessary.

[^1]: Don't mind the [bad udp chksum] on the outbound packets. UDP checksum generation happens after the point where tcpdump hooks. [^2]: After going through a couple hundred of lines of openflow and running the openflow engine on brain wetware, cannot recommend that. [^3]: Note that to reach this rule, a bunch of preconditions need to be satisfied, such as that the packet must originate from the VMs port. This will thus not affect queries sent to a DNS server running in a VM, for example.

horazont commented 7 months ago

We should probably talk about the fact that this exposes a daemon which

  1. runs as root and/or has simply because of its responsibilities a lot of power in OpenStack,
  2. is shared between tenants/projects, and
  3. is written in C

directly to traffic sent by customer VMs.

markus-hentsch commented 7 months ago

Adding to https://github.com/SovereignCloudStack/issues/issues/229#issuecomment-2066368242 after validating a few more things:

In OVN setups:

[^1]: note: extensions often supersede each other and include all functionality of the ones that they are replacing, see https://docs.openstack.org/designate/latest/user/neutron-integration.html#subnet-dns-publish-fixed-ip

In OVS setups:

I tried figuring out how this works in OVS-based setups. However, it does not seem easy to figure out whether the OVS configuration of Neutron is actually supposed to even support the internal DNS resolution.

A page about OpenStack deployment using Juju Charms^2 states:

OVN handles instance access to DNS differently to how ML2+OVS does. Please refer to the Internal DNS resolution paragraph in this document for details.

... and the paragraph in question says:

OVN supports Neutron internal DNS resolution.

That kind of suggests that OVS might not support this but it is not clearly stated anywhere.

I redeployed my DevStack with OVS and so far wasn't able to get internal DNS resolution working. The /etc/resolv.conf entries in VMs now also point to the DHCP agent directly. It seems that in any case with OVS no such interception like with OVN will happen.

Update to OVS (2024-04-22):

OVS OVN
DHCP agent(s) are specified as DNS server(s) in resolv.conf if no --dns-nameserver was specified. The entry in resolv.conf always refers to genuine DNS server(s) (likely external) as per dns_servers setting of ml2_conf.ini under the [ovn] section.
Internal DNS only works if --dns-nameserver was not used. Internal DNS works regardless of subnet-specific settings.
Internal DNS is implemented by the DHCP agent, which resolves it either locally or externally depending on the request. Internal DNS is implemented by protocol traffic interception via OVN OpenFlow.
Automatic Internal DNS is limited to IP-based FQDNs like host-10-1-0-22.openstacklocal. Automatic Internal DNS uses the VM names like my-server.openstacklocal.
markus-hentsch commented 7 months ago

Practical implementation of External DNS / Designate

After

... adding Designate to DevStack (click to expand)

Add to neutron.conf (replace $HOST_IP according to openstack endpoint list):

[DEFAULT]
external_dns_driver = designate

# ...

[designate]
url = http://$HOST_IP/dns/v2
project_domain_name = Default
project_name = service
user_domain_name = Default
password = nomoresecret
username = neutron
auth_url = http://$HOST_IP/identity
auth_type = password
cafile = /opt/stack/data/ca-bundle.pem
allow_reverse_dns_lookup = True
ipv4_ptr_zone_prefix_size = 24
ipv6_ptr_zone_prefix_size = 116
ptr_zone_email = admin@example.org

I was able to verify the behavior of the dns_domain_ports extension according to use cases 1 and 2 ^1 as well as Designate (use case 3):

Case 1: DNS record through Floating IP assignment and network/port DNS settings

Use case:

Example:

openstack zone create example.org.
openstack network set --dns-domain example.org. private
openstack server create --network private ... my-server
openstack floating ip create public
openstack server add floating ip my-server 10.0.1.146
openstack recordset list example.org.

# +------+------------------------+------+---------------------------+--------+--------+
# | id   | name                   | type | records                   | status | action |
# +------+------------------------+------+---------------------------+--------+--------+
# | ...  | my-server.example.org. | A    | 10.0.1.146                | ACTIVE | NONE   |
# +------+------------------------+------+---------------------------+--------+--------+

Case 2: Direct DNS record for Floating IP

Example:

openstack zone create example.org.
openstack floating ip create --dns-domain example.org. --dns-name other-server public
openstack recordset list example.org.

# +------+---------------------------+------+------------------------+--------+--------+
# | id   | name                      | type | records                | status | action |
# +------+---------------------------+------+------------------------+--------+--------+
# | ...  | other-server.example.org. | A    | 10.0.1.143             | ACTIVE | NONE   |
# +------+---------------------------+------+------------------------+--------+--------+

Case 3: DNS records for fixed IPs

Case 3a: static records using the subnet-dns-publish-fixed-ip extension

NOTE: This works independently of whether the network/subnet actually is externally reachable. This is not verified by Neutron in contrast to case 3b!

Example:

openstack zone create example.org.
openstack network create private-2
openstack subnet create --network private-2 --subnet-range 10.3.0.0/26 --dns-publish-fixed-ip private-2-subnet
openstack port create port1 --dns-domain example.org. --dns-name port1 --network private-2
openstack recordset list example.org.

# +------+------------------------+------+---------------------------+--------+--------+
# | id   | name                   | type | records                   | status | action |
# +------+------------------------+------+---------------------------+--------+--------+
# | ...  | port1.example.org.     | A    | 10.3.0.22                 | ACTIVE | NONE   |
# +------+------------------------+------+---------------------------+--------+--------+

This also works for servers instead of ports, if the dns_domain is set on the network directly instead, which will propagate to all ports created within automatically:

Example:

openstack zone create example.org.
openstack network create private-2 --dns-domain example.org.
openstack subnet create --network private-2 --subnet-range 10.3.0.0/26 --dns-publish-fixed-ip private-2-subnet
openstack server create --network private-2 ... example-server
openstack recordset list example.org.

# +------+-----------------------------+------+----------------------+--------+--------+
# | id   | name                        | type | records              | status | action |
# +------+-----------------------------+------+----------------------+--------+--------+
# | ...  | example-server.example.org. | A    | 10.3.0.37            | ACTIVE | NONE   |
# +------+-----------------------------+------+----------------------+--------+--------+

Case 3b: automatic records for externally reachable fixed IPs using dns_domain_ports extension

NOTE: In contrast to case 3a this does not require the subnet-dns-publish-fixed-ip extension or any of its successors. However, Neutron will only apply this case to fixed IPs in networks that are detected as external (see the expandable spoiler below; router:external=External DOES NOT count)!

What is considered "externally reachable"? (click to expand) The decision whether a network counts as externally reachable (as per the upstream documentation) is the inverse of the following implementation: neutron/plugins/ml2/extensions/dns_integration.py#L382-L405 **Note that networks with `router:external=External` explicitly *do not* count as externally reachable in this context!** Annotated version: ```python # note that this method returns the inverse of "DNS record will be published" def external_dns_not_needed(self, context, network, subnets): dns_driver = _get_dns_driver() # If no external driver is registered (e.g. Designate), # do not attempt to publish records if not dns_driver: return True # If the dns_publish_fixed_ip extension is used anywhere, publish a record for subnet in subnets: if subnet.get('dns_publish_fixed_ip'): return False # Networks with the external router attribute are exempt from DNS records if network['router:external']: return True # For networks with more than 1 segment, publish DNS (?) segments = segments_db.get_network_segments(context, network['id']) if len(segments) > 1: return False provider_net = segments[0] # Network type "local" is exempt from DNS records if provider_net['network_type'] == lib_const.TYPE_LOCAL: return True # Network type "flat" will always publish DNS records # (unless above checks fail) if provider_net['network_type'] == lib_const.TYPE_FLAT: return False # Network type "vlan" will publish DNS records, if the segmentation # id is outside of the ranges for project networks if provider_net['network_type'] == lib_const.TYPE_VLAN: return self._is_vlan_tenant_network(provider_net) # Network type "gre", "vxlan" and "geneve" will publish DNS records, # if the segmentation id is outside of the ranges for project networks if provider_net['network_type'] in [ lib_const.TYPE_GRE, lib_const.TYPE_VXLAN, lib_const.TYPE_GENEVE]: return self._is_tunnel_tenant_network(provider_net) return True ```

Example:

# (as admnistrator)
openstack network create --provider-network-type flat --project $TENANT_PROJECT --provider-physical-network testphysnet flat-demo
openstack subnet create --network flat-demo --project $TENANT_PROJECT --subnet-range 10.5.0.0/26 flat-demo-subnet

# (as tenant)
openstack port create tenant-port-flat-external --dns-name tenant-port-external --dns-domain example.org. --network flat-demo
openstack recordset list example.org.

# +------+-----------------------------------+------+----------------------+--------+--------+
# | id   | name                              | type | records              | status | action |
# +------+-----------------------------------+------+----------------------+--------+--------+
# | ...  | tenant-port-external.example.org. | A    | 10.5.0.57            | ACTIVE | NONE   |
# +------+-----------------------------------+------+----------------------+--------+--------+

Summary

The two major extension driver choices compared:

dns_domain_ports subnet_dns_publish_fixed_ip (or any superseding one)
Automatic DNS records only work on CSP-configured networks (provider networks). Automatic DNS records also work on tenant-created networks (self-service).
Networks have to match specific criterias^2 to allow DNS record creations. No restriction on network attributes for DNS record creation.
Record creation is implicit if all criterias are met. Explicit dns_publish_fixed_ip attribute for subnets to trigger behavior.
External reachability of network is partially enforced by the criterias. External reachability of network is not actually enforced, DNS records might not work at all.
markus-hentsch commented 7 months ago

Worth noting: [OVN] DNS resolution not forwarded with OVN driver

TL;DR: According to the bug report, in OVN setups VMs that have no connectivity to the outside DNS servers will also not receive local DNS responses for internal DNS or Designate (since no DNS traffic passes OVN that it can intercept and manipulate). This is different from OVS where the DHCP agent was always available as a local DNS target regardless of whether the VM actually had external connectivity.

markus-hentsch commented 7 months ago

DNS extension hierarchy in Neutron

Important learning after debugging the problem that I could not reproduce the subnet-dns-publish-fixed-ip behavior^1 as documented upstream:

There is a hierarchy of DNS extensions for ML2 in Neutron^2 in which they supersede one another in terms of functionality:

To get the subnet-dns-publish-fixed-ip functionality, either subnet-dns-publish-fixed-ip or dns-integration-domain-keywords (which includes the former) must be activated.

Bug in Neutron?

However, the documentation^1 only states:

Check that the subnet-dns-publish-fixed-ip Neutron extension is enabled.

In a section further above it illustrates an example how to check enabled extensions:

$ openstack extension list --network -f value -c Alias | grep dns-domain-ports
dns-domain-ports

If that is done for the subnet-dns-publish-fixed-ip extension, the extension name "subnet-dns-publish-fixed-ip" is returned by the API even if only the dns_domain_ports extension driver is active which does not offer the functionality at all. Any subnet created with the flag ends up with dns_publish_fixed_ip attribute set to None and the database table for the flag stays empty.

Only after the entry in /etc/neutron/plugins/ml2/ml2_conf.ini is adjusted to replace the extension driver by one that supports the feature:

[ml2]
extension_drivers = port_security,qos,subnet_dns_publish_fixed_ip

... the extension actually works.

This seems to be a bug in Neutron API since it communicates an extension as being active when it actually isn't? To make matters worse, the documentation explicitly instructs the user to check the API extensions which will mislead them as it happened to me.

Note that API extensions as communicated by openstack extension list are always written with dashes whereas the extension driver's names always use underscores.

markus-hentsch commented 7 months ago

After lots of trial-and-error I was able to reproduce all relevant use cases with Designate and updated https://github.com/SovereignCloudStack/issues/issues/229#issuecomment-2073211589 with all findings accordingly.

horazont commented 7 months ago

Worth noting: [OVN] DNS resolution not forwarded with OVN driver

TL;DR: According to the bug report, in OVN setups VMs that have no connectivity to the outside DNS servers will also not receive local DNS responses for internal DNS or Designate (since no DNS traffic passes OVN that it can intercept and manipulate). This is different from OVS where the DHCP agent was always available as a local DNS target regardless of whether the VM actually had external connectivity.

Soo... always add 169.254.169.254 or something to your dns nameservers, which would hopefully hit the default route?

markus-hentsch commented 7 months ago

I looked a bit more into the unexpected behavior mentioned in https://github.com/SovereignCloudStack/issues/issues/229#issuecomment-2076750231 and filed a bug in Neutron: https://bugs.launchpad.net/neutron/+bug/2063669

markus-hentsch commented 7 months ago

Worth noting: [OVN] DNS resolution not forwarded with OVN driver TL;DR: According to the bug report, in OVN setups VMs that have no connectivity to the outside DNS servers will also not receive local DNS responses for internal DNS or Designate (since no DNS traffic passes OVN that it can intercept and manipulate). This is different from OVS where the DHCP agent was always available as a local DNS target regardless of whether the VM actually had external connectivity.

Soo... always add 169.254.169.254 or something to your dns nameservers, which would hopefully hit the default route?

I both reproduced the "bug" as well as verified your suggestion (which seems to work!):

Offline network with default nameserver (OVN)

openstack network create offline
openstack subnet create --network offline --subnet-range 192.168.100.0/24 offline-subnet
openstack server create --network offline ... server-1
openstack server create --network offline ... server-2
openstack console url show server-1

ping server-2.openstackgate.local does not work from server1 via VNC console (/etc/resolv.conf contains nameserver 8.8.8.8 as per Neutron's default). This confirms https://bugs.launchpad.net/neutron/+bug/1902950

Offline network with 169.254.169.254 nameserver (OVN)

Re-using the above:

openstack server delete server-1
openstack server delete server-2
openstack subnet set --dns-nameserver 169.254.169.254 offline-subnet
openstack server create --network offline ... server-1
openstack server create --network offline ... server-2
openstack console url show server-1

Neutron-internal name resolution and ping server-2.openstackgate.local from server-1 now works!

markus-hentsch commented 6 months ago

I concluded the standard draft in SovereignCloudStack/standards#570 and added a test script.

markus-hentsch commented 1 month ago

On 2024-09-23 the current guidelines of the standard were discussed with CSP representatives (artcodix, plusserver, among others) in the Lean SCS Operator Coffee. Below is a summary of the discussion points that I noted down:

Unfortunately, this does not seem to leave a whole lot of options for the standard to include anything more than a few SHOULDs right now.