coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

NetworkManager: consider defaulting to EUI-64 for IPv6 SLAAC (at least on OpenStack) #907

Open brtkwr opened 3 years ago

brtkwr commented 3 years ago

Describe the bug I am attempting to spawn VMs on OpenStack with IPv6 enabled. The interface appears but the IP address is incorrect. For example, instead of fd5e:d3bb:de2e:0:f816:3eff:fe9d:1342/64, the IP address that is rendered is fd5e:d3bb:de2e:0:6f88:6036:cd38:c7cc/64.

Reproduction steps Steps to reproduce the behavior:

  1. Deploy DevStack with IPv6 enabled (its enabled by default) which creates a network with dual subnet, one IPv4 and second IPv6
  2. Deploy a VM: openstack server create --image fedora-coreos-34.20210529.3.0-openstack.x86_64 --flavor ds2G --key-name default --network private test-fcos
  3. Attach floating IP to the IPv4 interface to SSH into the instance
  4. Run sudo ip addr add fd5e:d3bb:de2e:0:f816:3eff:fe9d:1342/64 dev ens3 to make IPv6 work on the instance

Expected behavior IPv6 should work out of the box.

Actual behavior IPv6 address is rendered incorrectly and as a result doesnt work.

System details

Ignition config No config provided.

Additional information I tried the suggestion here: https://github.com/coreos/fedora-coreos-tracker/issues/888#issuecomment-878426854 with a reboot but that didn't help my situation. Happy to tmate for anyone who wants to inspect the issue.

dustymabe commented 3 years ago

Hey @brtknr - Sorry if these questions are obvious..

brtkwr commented 3 years ago

Hi @dustymabe

brtkwr commented 3 years ago

I tried booting up a regular Fedora Cloud 33 image on the same OpenStack deployment and that seems to get an IP address allocated correctly but that uses cloud-init ofcourse. I am not sure what's missing in the Fedora CoreOS bootstrap logic to accomodate this.

brtkwr commented 3 years ago

Looks like something has gone wrong with SLAAC: https://en.wikipedia.org/wiki/IPv6#Stateless_address_autoconfiguration_(SLAAC)

If you put in the mac address for the interface in question: fa:16:3e:51:33:e5 into https://www.vultr.com/resources/mac-converter/?mac_address=fa%3A16%3A3e%3A51%3A33%3Ae5, the expected Contained EUI-48 (U/L) address is: f8:16:3e:ff:fe:51:33:e5 which matches the address on Neutron but Fedora CoreOS instance somehow renders this to a6:b5:33:c0:c5:5e:19:52 which is completely off.

brtkwr commented 3 years ago

I believe my issue is related to this actually: https://github.com/coreos/fedora-coreos-tracker/issues/513

lucab commented 3 years ago

On FCOS, NetworkManager is in charge of network configuration. That means that on your VM it is likely already handling ens3, and thus possibly conflicting with your manual ip addr changes. I'm not surprised you are seeing weird results when mixing auto-configuration via NM and manual commands, and I would really recommend against doing the latter.

For further feedback on this ticket, it would be helpful to get the full logs from the NM service after a fresh boot and without further manual network tweaking. Also, the output of ip -6 addr and a brief description of the underlying network infra would be helpful (it looks like it could be a SLAAC setup for a private ULA subnet?).

brtkwr commented 3 years ago

Hi @lucab, it looks like the default setting for the interface is stable-privacy:

[core@k8s-devstack-v55mhdpu3cse-node-1 ~]$ nmcli connection edit "Wired connection 1"
===| nmcli interactive connection editor |===

Editing existing '802-3-ethernet' connection: 'Wired connection 1'

Type 'help' or '?' for available commands.
Type 'print' to show all the connection properties.
Type 'describe [<setting>.<prop>]' for detailed property description.

You may edit the following settings: connection, 802-3-ethernet (ethernet), 802-1x, dcb, sriov, ethtool, match, ipv4, ipv6, hostname, tc, proxy
nmcli> print ipv6.addr-gen-mode
ipv6.addr-gen-mode: stable-privacy

After changing this to 0 (based on this guide: https://ibert.tech/articles/activate-eui-64-on-ubuntu-desktop.html), I am now getting a stable address and the instance is reachable externally. Now my question is why this is stable-privacy by default as this seems to render the instance unreachable via IPv6.

brtkwr commented 3 years ago

On FCOS, NetworkManager is in charge of network configuration. That means that on your VM it is likely already handling ens3, and thus possibly conflicting with your manual ip addr changes. I'm not surprised you are seeing weird results when mixing auto-configuration via NM and manual commands, and I would really recommend against doing the latter.

For further feedback on this ticket, it would be helpful to get the full logs from the NM service after a fresh boot and without further manual network tweaking. Also, the output of ip -6 addr and a brief description of the underlying network infra would be helpful (it looks like it could be a SLAAC setup for a private ULA subnet?).

What appears to be happening is that the address generated by stable-privacy setting seems to render the instance unreachable. It needs the address which can be generated via eui64 config to work.

dustymabe commented 3 years ago

I'm able to reproduce the same behavior on Vexxhost (openstack public cloud provider). Thanks @brtknr for reporting the issue.

You can see that the profile autogenerated by NM has ipv6.addr-gen-mode=stable-privacy:

$ sudo cat "/run/NetworkManager/system-connections/Wired connection 1.nmconnection"
[connection]
id=Wired connection 1
uuid=4d2aec5c-0c3d-3fe4-8927-c3eb9d198d42
type=ethernet
autoconnect-priority=-999
interface-name=ens3
permissions=
timestamp=1627308807

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
method=auto

[proxy]

[.nmmeta]
nm-generated=true

A few comments/questions for the broader community:

  1. What do we think would be most appropriate for FCOS to use for ipv6.addr-gen-mode (eui64 or stable-privacy)? Is the answer the same for all platforms? This might be worth it's own issue tracker discussion ticket.
  2. Currently we can't set the ipv6.addr-gen-mode globally. After a brief discussion with the NM team we're going to re-visit and see if it's worth setting that in the global configuration: https://bugzilla.redhat.com/show_bug.cgi?id=1743161#c11
dustymabe commented 3 years ago

WORKAROUND

Add this bit to your butane configs:

variant: fcos
version: 1.3.0
storage:
  files:
    - path: /etc/NetworkManager/system-connections/default.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=Wired Connection
          type=ethernet
          autoconnect-retries=1
          multi-connect=3
          permissions=
          [ethernet]
          mac-address-blacklist=
          [ipv4]
          dhcp-timeout=90
          dns-search=
          method=auto
          [ipv6]
          addr-gen-mode=eui64
          dhcp-timeout=90
          dns-search=
          method=auto
          [proxy]

The file was generated with:

/usr/libexec/nm-initrd-generator -s -- ip=dhcp,dhcp6

and then the uuid was removed.

brtkwr commented 3 years ago

Thanks @dustymabe for confirming the issue and the workaround, thats quite handy indeed! I will give it a whirl and get back to you.

cgwalters commented 3 years ago

I suspect this is basically client systems should use stable-privacy, servers should use eui64, right?

dustymabe commented 3 years ago

I suspect this is basically client systems should use stable-privacy, servers should use eui64, right?

That makes sense to me

dustymabe commented 3 years ago

Luca pointed this out in the meeting today: Openstack does at least document this limitation: https://docs.openstack.org/neutron/wallaby/admin/config-ipv6.html#configuring-interfaces-of-the-guest

brtkwr commented 3 years ago

Is this a primarily server side OS or client side? I would put my guess on server side given the immutability etc. How do other clouds handle this issue?

Sent from my iPhone

On 4 Aug 2021, at 19:48, Dusty Mabe @.***> wrote:

 Luca pointed this out in the meeting today: Openstack does at least document this limitation: https://docs.openstack.org/neutron/wallaby/admin/config-ipv6.html#configuring-interfaces-of-the-guest

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

brtkwr commented 3 years ago

WORKAROUND

Add this bit to your butane configs:

variant: fcos
version: 1.3.0
storage:
  files:
    - path: /etc/NetworkManager/system-connections/default.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=Wired Connection
          type=ethernet
          autoconnect-retries=1
          multi-connect=3
          permissions=
          [ethernet]
          mac-address-blacklist=
          [ipv4]
          dhcp-timeout=90
          dns-search=
          method=auto
          [ipv6]
          addr-gen-mode=eui64
          dhcp-timeout=90
          dns-search=
          method=auto
          [proxy]

The file was generated with:

/usr/libexec/nm-initrd-generator -s -- ip=dhcp,dhcp6

and then the uuid was removed.

This workaround did the trick btw, thank you

lucab commented 3 years ago

We touched this topic in the last meeting, and we want to look around a bit more before touching anything:

  * ACTION: - dustymabe to figure out how the cloud edition is handling
    the ipv6.addr-gen-mode=stable-privacy problem  (dustymabe, 17:14:17)
dustymabe commented 3 years ago

On Fedora Cloud base cloud-init writes out this file:

$ cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=fa:16:3e:24:3c:3b
IPV6INIT=yes
IPV6_AUTOCONF=yes
MTU=1500
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

The default when creating a connection via D-BUS is stable-privacy and the default when reading a keyfile/ifcfg-rh file from disk is eui64. The difference in behavior was put in place to ease legacy migrations.

2021-07-26 10:54:43     @thaller        explicit values "eui64" and "stable-privacy" (and confusingly, the default value differs whether the profile gets received via D-Bus, or loaded from keyfile/ifcfg-rh file).
...
2021-07-26 10:58:53     @thaller        this was done so that when you create a profile on D-Bus (that is "now"), then the new default is "stable-privacy". If you have a profile on disk (created 5 years ago), then the default would stay at eui64.

TL;DR Fedora Cloud Base does not have this problem

dustymabe commented 3 years ago

Also, just in case this wasn't clear.. If I delete /etc/sysconfig/network-scripts/ifcfg-eth0 and reboot the F34 cloud base instance then we get the same config (i.e. dynamically created NM connections) and same behavior as FCOS.

pjoomen commented 3 years ago

The workaround (using butane configuration) is a viable alternative for OKD4 based nodes (using variant: openshift, instead of variant: fcos), but is not helping when using networkType:OVNKubernetes.

Reference https://bugzilla.redhat.com/show_bug.cgi?id=1743161#c14

dustymabe commented 3 years ago

We discussed this in the community meeting today.

12:56:15     dustymabe | #agreed We will work with the NetworkManager team to get in place a
                       | configuration setting for a default value for ipv6.addr-gen-mode and
                       | apply that to all of FCOS when it's ready.

12:56:15     dustymabe | #agreed In the shorter term we may try to find some other way to set it
                       | without requiring the NetworkManager feature to be implemented.

12:56:15     dustymabe | #agreed We'll also reach out to FESCO and try to convince the rest of
                       | Fedora that `stable-privacy` makes most sense in a workstation/laptop
                       | setting and we should apply eui64 as the default for all server like
                       | variants.
brtkwr commented 3 years ago

Sounds like it was a productive meeting!

pjoomen commented 3 years ago
12:56:15     dustymabe | #agreed We'll also reach out to FESCO and try to convince the rest of
                       | Fedora that `stable-privacy` makes most sense in a workstation/laptop
                       | setting and we should apply eui64 as the default for all server like
                       | variants.

That sounds like the proper solution to me.

LorbusChris commented 2 years ago

https://bugzilla.redhat.com/show_bug.cgi?id=1743161 has been closed as WONTFIX. Is there another way to set a default for this setting?

dustymabe commented 2 years ago

Had a discussion this morning with the NM team. Lubomir in particular had some strong pushback against reverting to eui64 for server like editions in Fedora:

Reverting
the defaults back to obsolete EUI-64 method is a complete no-go. It has been
deprecated for very good reasons, IETF's position is detailed in [RFC 8064].

Apart from the privacy issues, the problems of EUI-64 mechanism affecting
the servers are:

* The EUI-64 identifiers, being based on hardware address change with
  replacement on interface cards. Not great on servers.
* The mechanism produces a single address. On DAD failures, the machine
  ends up with not connectivity.

If the machine needs a predictable address, either the provisioning should
assign a static address or utilize DHCPv6. For the cases where this wouldn't
be possible (and the environment is controlled enough for the various issues
with EUI-64 don't apply, e.g. virtual networks), we are willing to provide
a way to switch defaults to EUI-64 in system configuration, but we're
strongly opposed to making this any sort of default.

I'll be joining the meeting in an hour, happy to discuss this further.

Pointers to relevant RFCs, for reference:

[RFC 7217] A Method for Generating Semantically Opaque Interface
           Identifiers with IPv6 Stateless Address Autoconfiguration (SLAAC)
           <https://www.rfc-editor.org/rfc/rfc7217>

[RFC 8064] Recommendation on Stable IPv6 Interface Identifiers
           <https://www.rfc-editor.org/rfc/rfc8064>

However it was brought up by other participants in the meeting that in some cases this does cause an issue where the hypervisor or cloud platform might not know the address being used by the instance. I believe under qemu, if qemu-guest-agent is installed (we don't have that installed in FCOS), it could be picked up that way. The NM team would like to understand cases where the hypervisor/platform mismatch exist so they can further understand the problem here.

What we did agree on was that we could get a global NM configuration knob for configuring this. They asked me to open a new BZ (not re-open BZ#1743161). I did that here: BZ#2082682

dustymabe commented 2 years ago

The upstream change for BZ#2082682 landed in NetworkManager 1.39.8+

We are now unblocked to move forward with setting a global default at least in rawhide.

An example butane config that sets the global default:

variant: fcos
version: 1.4.0
storage:
  files:
    - path: /etc/NetworkManager/conf.d/90-ipv6-addr-gen-mode-override.conf
      mode: 0600
      contents:
        inline: |
          [connection-90-ipv6-addr-gen-mode-override]
          match-device=type:ethernet
          ipv6.addr-gen-mode=0
dustymabe commented 2 years ago

We discussed this at our community meeting today.

It seems that given new information there isn't as much support for changing the global default for all FCOS platforms. For OpenStack specifically we want to do a little more investigation to see if we can dynamically determine if the OpenStack env is set up for IPv6 SLAAC and see if we can dynamically set the global configuration default in that case. There may be a chicken and egg issue there though.

13:23:59*       dustymabe | #action jlebon to reach out to OpenStack experts to see if we can
                          | detect when the platform is expecting machines to do IPV6 network
                          | configuration via SLAAC (to get eui64 based IPv6 addresses)

We'll then make the determination if we want to set it conditionally or unconditionally on OpenStack.

Tangentially another point was brought up that may influence our decision to set a global default for this. I've opened https://github.com/coreos/fedora-coreos-tracker/issues/1266 to continue that discussion.

jlebon commented 2 years ago

I've reached out to Rodolfo Hernandez (thanks so much!) who works on OpenStack Neutron. The TL;DR is:

  1. Ironic is not an issue wrt. network cards being swapped causing addresses to change. There is code that automatically updates Neutron of the changes.
  2. OpenStack supports IPv6 configuration via SLAAC or DHCPv6. This is configurable by the user at the subnet level.
  3. There are no practical ways to figure out what kind of subnet a VM finds itself in before fully bringing up networking.

Given the above, I think we can investigate changing the default only for OpenStack. There was a doubt however about whether setting ipv6.addr-gen-mode=eui64 even if DHCPv6 is used can cause any issues. From my reading of the docs, that's not the case but it'd be good to test it to confirm or reach out to NM folks.

thom311 commented 2 years ago

There was a doubt however about whether setting ipv6.addr-gen-mode=eui64 even if DHCPv6 is used can cause any issues. From my reading of the docs, that's not the case but it'd be good to test it to confirm or reach out to NM folks.

Nothing comes to mind. From NetworkManager manages point of view, with ipv4.addr-gen-mode=eui64|stable-privacy is always generates some IPv6 interface identifier. The value that it generates has nor further significance and should not have any relation with DHCPv6. Well sure, different link local and SLAAC addresses get generated, but that is probably not a cause for problems.

If you find a problem, please report :)

dustymabe commented 2 years ago

We discussed this in the community meeting today.

13:41:17  dustymabe | #agreed we will set ipv6.addr-gen-mode=eui64 as the
                    | default on our OpenStack platform since the platform
                    | expects this to be the case. We will attempt to leave
                    | currently deployed systems alone so that we don't
                    | change an existing system's IP address.
MindTooth commented 1 year ago

I hate to ask, but for anyone, including myself, in which version is this fix included?

MaysaMacedo commented 1 year ago

@MindTooth According to this bugzilla NetworkManager-1.39.10-1.el8 contains the implementation that allows setting ipv6.addr-gen-mode in global config.

llambiel commented 1 month ago

Exoscale provider is facing the same issue, initial report https://github.com/coreos/fedora-coreos-tracker/issues/513

Following https://github.com/coreos/fedora-coreos-tracker/issues/907#issuecomment-1226134528 Is there any hope to see this change landing any time soon ?