coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

New Package Request: nmstate #1175

Open cathay4t opened 2 years ago

cathay4t commented 2 years ago

https://github.com/nmstate/nmstate

  1. What, if any, are the additional dependencies on the package? (i.e. does it pull in Python, Perl, etc)

    Both of them are rust based. Only depend on glibc.

  2. What is the size of the package and its dependencies?

    nmstate-libs-2.1.0-0.3.alpha2.fc37.x86_64.rpm: 1.6MiB(1646494 bytes) nmstate-2.1.0-0.3.alpha2.fc37.x86_64.rpm: 2.0MiB (2140203 bytes)

  3. What problem are you trying to solve with this package? Or what functionality does the package provide?

    Nmstate currently is used by OpenShift for host day 2 network configuration via kubernetes-nmstate through a container. Including nmstate into CoreOS could allow nmstate been used as day 1 and day 0 configuration where container is not allowed. The nmstate package ships with a CLI tool nmstatectl. The nmstate-libs pacakge ships C library for network configruation via NetworkManager. Openshift projects could use go binding of nmstate through this C library.

  4. Can the software provided by the package be run from a container? Explain why or why not.

    Yes. But day1 and day0 has no container environments yet.

  5. Can the tool(s) provided by the package be helpful in debugging container runtime issues?

    No. This is for host network configuration.

  6. Can the tool(s) provided by the package be helpful in debugging networking issues?

    Yes. The nmstatectl show could provide current network states.

  7. Is it possible to layer the package onto the base OS as a day 2 operation? Explain why or why not.

    No. This request is for enabling day 1 and day 0 network configurations.

  8. In the case of packages providing services and binaries, can the packaging be adjusted to just deliver binaries?

    No service is provided by these two packages.

  9. Can the tool(s) provided by the package be used to do things we’d rather users not be able to do in FCOS? (e.g. can it be abused as a Turing complete interpreter?)

    No.

  10. Does the software provided by the package have a history of CVEs?

    No.

travier commented 2 years ago

We would need F36 packages to be able to include that. Could you link to the project repos & packages? Thanks!

bgilbert commented 2 years ago

What is the expected usage model for these packages? Would the Ignition config write an nmstate configuration and then a service to invoke nmstate to apply it?

cgwalters commented 2 years ago

I think all new package requests are going to be weighed with https://github.com/coreos/enhancements/blob/main/os/coreos-layering.md in mind.

But...this one is about networking which may be needed for early boot.

I think honestly what I'd like to see is a signoff from the NetworkManager folks about a long term commitment to supporting this. Will this code be planned to ship in RHEL (9?) too?

bgilbert commented 2 years ago

I think all new package requests are going to be weighed with https://github.com/coreos/enhancements/blob/main/os/coreos-layering.md in mind.

Yes, but that's strictly additive. There have been no discussions of ceasing support for the existing usage model.

cathay4t commented 2 years ago

@cgwalters This code is shipping in RHEL 8/9 with full support. I am the package maintainer in RHEL.

@thom311 as NetworkManager main devleoper, could you sign-off the long term support from NetworkManager on DBUS interface nmstate is using?

cathay4t commented 2 years ago

I will create Fedora 36/35/34 update for this before end of this week.

thom311 commented 2 years ago

@thom311 as NetworkManager main devleoper, could you sign-off the long term support from NetworkManager on DBUS interface nmstate is using?

NetworkManager is not gonna break it's clients. In particular, it will stronlgy care to support nmstate.

jlebon commented 2 years ago

We started discussing this in this week's community meeting, but decided to punt until we could get some SMEs in attendance.

@cathay4t Would you and/or someone from the NMState team be able to join the Fedora CoreOS community meeting at some point? It is on Wednesdays at 16:30 UTC. @thom311 It would be good if you and/or someone from the NM team joined as well.

Feel free to reach out in #forum-coreos in Libera.Chat to coordinate.

cathay4t commented 2 years ago

Sure. Let me talk with openshift developers to get more detailed potential use cases before joining your meeting.

cathay4t commented 2 years ago

@jlebon I got 404 for link https://apps.fedoraproject.org/calendar/CoreOS which found at page of https://docs.fedoraproject.org/en-US/fedora-coreos/faq/ . Any more information for this Fedora CoreOS community meeting?

bgilbert commented 2 years ago

@cathay4t Looks like it's https://calendar.fedoraproject.org/CoreOS/ now. More info at https://github.com/coreos/fedora-coreos-tracker/#meetings.

cathay4t commented 2 years ago

Fedora 36 nmstate and nmstate-libs update is on-going at https://bodhi.fedoraproject.org/updates/FEDORA-2022-538995057e Fedoar 36 is shipping the updated version of nmstate and nmstate-libs

dustymabe commented 2 years ago

We discussed this in the community meeting today.

There was lots of thorough discussion. Some details that emerged:

During the meeting we did come to an intermediate decision for the time being:

13:33:05   dustymabe   #agreed nmstate is useful for dynamically configuring
                       a running system OR generating NM keyfiles with no running
                       system needed. The nmstate YAML config itself can't be used
                       on system boot without an accompanying service to apply the
                       config. In order to use it for provisioning a user would need
                       to write a systemd unit themselves to apply the config they
                       wrote using Ignition. At this time we would like to continue
                       not including nmstate in the host and experiment with
                       applications that want to configure networking via nmstate
                       bundling it. 

However, we did agree to continue the discussion to further our understanding of the use cases of the hypothetical nmstate consumers and invited ffmancera and thaller to join us at our video community meeting next week.

qinqon commented 2 years ago

Including nmstate at FCOS will help with future effort of configuring networking at openshift installation for virtual and baremetal, right now for OVN the bridge is configured by a quite difficult to maintain 500 linux bash script [1] also at that point nmstate cannot being consumed as a container so it has to be on the nodes to integrate into that kind of scripits, but maybe the owners can share more about this use case @cybertron @jcaamano [1] https://github.com/openshift/machine-config-operator/blob/48d88a7f75e97c87b2a6f4b00862a091fccbdd72/templates/common/_base/files/configure-ovs-network.yaml.

zaneb commented 2 years ago

I don't think CoreOS needs to provide a systemd service that reads and applies the nmstate yaml for nmstatectl to be useful. Users can write their own systemd service to do it and include it in the ignition. What they can't do is pull a container with nmstatectl in it to do that, because there may be network configuration required to access a container registry.

Currently this can be worked around by pre-generating the keyfiles using nmstatectl gc and shipping the keyfiles in the ignition. We're doing this in several places in OpenShift and it works, but it does limit where we can do it - currently it has to be in a container with the Python runtime and everything, so we can do it in e.g. the installer bootstrap host, but not in the installer itself. Now that the Rust library and its bindings are available, it's theoretically possible to embed this in a single binary, but the build process is not simple. Running nmstatectl gc remotely also has some limitation, like the fact that it can't actually verify that the (physical) interfaces referenced in the nmstate actually exist on the host.

In future it might be nice to make use of this in more integrated ways, during the initramfs - in particular it would be great to be able to apply a network config with ignition and then chain to another ignition file fetched over the network post-configuration. But including the binary at all is surely the first step.

bgilbert commented 2 years ago

I don't think CoreOS needs to provide a systemd service that reads and applies the nmstate yaml for nmstatectl to be useful. Users can write their own systemd service to do it and include it in the ignition.

They can, but that workflow seems unnecessarily awkward. If we expect it to be common, we should just ship the systemd unit.

Running nmstatectl gc remotely also has some limitation, like the fact that it can't actually verify that the (physical) interfaces referenced in the nmstate actually exist on the host.

Ah, that's an interesting detail. Would you consider it significantly better to generate the configs on the machine at runtime?

in particular it would be great to be able to apply a network config with ignition and then chain to another ignition file fetched over the network post-configuration.

Ignition fundamentally doesn't work that way. Configuration changes applied by Ignition only affect the real root when we switch into it, never the initrd environment where Ignition itself runs.

cybertron commented 2 years ago

Running nmstatectl gc remotely also has some limitation, like the fact that it can't actually verify that the (physical) interfaces referenced in the nmstate actually exist on the host.

Ah, that's an interesting detail. Would you consider it significantly better to generate the configs on the machine at runtime?

Yes. The ability to verify the configuration and roll it back if it's invalid is extremely helpful. Blindly writing nmconnection files works, but it makes for a very bad user experience if there is an error in the provided configs.

I'm also currently looking into an issue where the generated nmconnection files don't work, but if I apply the same config directly using nmstatectl then it does the right thing. That may be a bug in nmstate or it may just be a limitation of that configuration method, but either way it would be avoided if we could just run nmstatectl on the host.

in particular it would be great to be able to apply a network config with ignition and then chain to another ignition file fetched over the network post-configuration.

Ignition fundamentally doesn't work that way. Configuration changes applied by Ignition only affect the real root when we switch into it, never the initrd environment where Ignition itself runs.

This is a really big problem for using Ignition in complex (or even not so complex) network environments. Static IPs cannot be used. Bond modes that don't work degraded cannot be used. These are both common use cases in baremetal deployments.

This is a tangent to the NMState discussion, but I think we should give some thought to providing Ignition some way to configure networking before it attempts to retrieve configs served over the network.

dustymabe commented 2 years ago

This is a really big problem for using Ignition in complex (or even not so complex) network environments. Static IPs cannot be used. Bond modes that don't work degraded cannot be used. These are both common use cases in baremetal deployments.

We handle this just fine. See the docs and ask us any clarifying questions in the discussion forum.

This is a tangent to the NMState discussion, but I think we should give some thought to providing Ignition some way to configure networking before it attempts to retrieve configs served over the network.

Agree. This is a tangent. Let's keep this focused on NMState.

jlebon commented 2 years ago

Ignition fundamentally doesn't work that way. Configuration changes applied by Ignition only affect the real root when we switch into it, never the initrd environment where Ignition itself runs.

This is a really big problem for using Ignition in complex (or even not so complex) network environments. Static IPs cannot be used. Bond modes that don't work degraded cannot be used. These are both common use cases in baremetal deployments.

This is a tangent to the NMState discussion, but I think we should give some thought to providing Ignition some way to configure networking before it attempts to retrieve configs served over the network.

There are multiple options to resolve this depending on your flow. Networking + Ignition is a huge topic in which we've invested a lot of work. Please see e.g. https://docs.fedoraproject.org/en-US/fedora-coreos/sysconfig-network-configuration/#_configuration_options for details.

cybertron commented 2 years ago

I started a thread for the Ignition discussion so we don't clutter up this one: https://discussion.fedoraproject.org/t/network-config-needed-to-pull-ignition/39094

qinqon commented 2 years ago

Checking the ignition rationale, the verification step from nmstate align with that since it ensure that the network configuration state asked by the user is reached or fail, so it will fail to boot with a clear message about what was not able to setup.

There are scenarios that NetworkManager cannot verify that the configuration specify at a keyfile is reached:

In general there are a lot kernel restrictions we cannot code in user space but trust verification found it.

Apart from that running nmstatectl at live system allows to compose the network state with current network configuration with a tool like nmpolicyctl so instead of having some nmstate per node we can have just one per cluster and rendered by nmpolicyctl.

P.D: This was a message at the fedoraproject discussion, but it make more sense here and also editing the message does not alert the spam bot :-)

bgilbert commented 2 years ago

I worry that we've been talking past each other here, and I do think it would be useful to get this unblocked. I'll summarize my understanding of the situation and y'all can correct me.

Use cases

nmstate might be used by the administrator during initial provisioning of the node, or by the administrator to reconfigure the node at runtime, or by applications to update network configuration at runtime.

  1. Application reconfiguration at runtime. Whenever possible, we do not ship software that is only useful for applications (or cluster orchestrators) at runtime, since such software should ship in the application container instead. Here, nmstate could run in a container and talk to the host NetworkManager via D-Bus.
  2. Administrator reconfiguration at runtime. We generally discourage such reconfiguration. The administrator should reprovision the node instead.
  3. Configuration during initial provisioning. This is potentially useful. In principle we could instruct users to render nmstate configs off-node and then copy in the resulting keyfiles, but they'd lose out on config validation that nmstate can perform at runtime. Therefore there's value in shipping nmstate in the OS, with a usage model where the user adds an nmstate config to the Ignition config.
    1. Non-autoconfig case. If the Ignition config isn't accessible until the network config is applied, there's a chicken-and-egg problem. On some platforms, Ignition needs network access to fetch the Ignition config at all. On all platforms, the Ignition config may reference arbitrary remote resources. And by design, any configuration written by Ignition — including network configuration — does not take effect until Ignition has finished running.
      1. Assumptions. Historically in addressing the chicken-and-egg problem, we've assumed that it only occurs on bare metal or simulated bare metal platforms (e.g. VMware), since cloud platforms will support network autoconfiguration such as DHCP. That is not always true (e.g. certain VPC configurations) but has been a workable simplification.
      2. CoreOS solution. CoreOS supports early network configuration by allowing users to embed network settings in the live ISO image used for bare metal installation. coreos-installer copies those settings onto the installed system, which applies them during the first boot before Ignition runs.
      3. Supporting nmstate. @qinqon is contributing https://github.com/coreos/coreos-installer/pull/864, which will allow embedding nmstate configs in the ISO by rendering them to NM keyfiles first. This is consistent with the current model, but loses nmstate's runtime validation features. We could go further and support such validation by allowing nmstate configs to be embedded directly in the live ISO, shipping nmstate in the initrd, and having it apply the configs before Ignition runs.
    2. Autoconfig case. Otherwise, we expect the node to be able to boot with basic autoconfigured networking (e.g. a management interface with DHCP), fetch the Ignition config and any referenced resources, apply the config, and continue booting. Any additional network configuration in the Ignition config (e.g. for bonded interfaces, or static IPs on additional interfaces) is then activated later in boot, after Ignition completes.
      1. Supporting nmstate. It seems natural to ship nmstate on the host, with a systemd service that automatically reads files out of a directory (e.g. /etc/nmstate) and applies them. The Ignition config could then simply write out nmstate files, just as it can with NM keyfiles today. If the nmstate service doesn't persist the network config into NetworkManager, it could run on every boot; if it does persist the config, it could run only on the first boot, or delete the files from /etc/nmstate after they've been applied.

Problem statement

Overall this seems like a reasonably straightforward change: ship nmstate and let its systemd service apply user configuration at boot, and maybe eventually add extra initrd support for the non-autoconfig case. And that leaves me very confused when I hear that nmstate has no plans to ship such a systemd service. If nmstate is only meant to be used by applications to do reconfiguration at runtime, then we don't need to ship it in the OS. If it's meant to be useful for initial node configuration, then users will need a way to apply nmstate configs.

What am I missing?

cathay4t commented 2 years ago

I(Nmstate developer) like the idea of nmstate shipping systemd service applying network state from /etc/nmstate. Let me talk to openshift-installer, coreos-installer and openshift baremetal installer team about this.

qinqon commented 2 years ago

@bgilbert about chicken-egg issue, we can do both things, keep the option to inject keyfiles generated by nmstate so initial networking is propertly configured to access the iginition url and also the systemd nmstate service to apply nmstate config from that retrieved ignition from the url, does it make sense ?

So users will have small initial nmstate configuration that is not validated and just allow to access ignition url and proper nmstate to setup the cluster that will be validated.

At a very naif use case for static non-dhcp interfaces, users will set the IPs with this initial nmstate so they can retrieve the ignition from the url and the ignition will contain the full nmstate to setup stuff like bridges, bonds and the like and it will be verified.

I understand that injecting initial nmstate can only be done for baremetal or psudo baremetal where we can customize the images right ? I don't now if we have scenarios with cloud + non-dhcp.

bgilbert commented 2 years ago

@qinqon Yes, I agree.

I understand that injecting initial nmstate can only be done for baremetal or psudo baremetal where we can customize the images right ? I don't now if we have scenarios with cloud + non-dhcp.

Right. Specifically, it can only be done when installing with coreos-installer, and not when launching directly from an OS image.

qinqon commented 2 years ago

@bgilbert related to "problem statement"

Overall this seems like a reasonably straightforward change: ship nmstate and let its systemd service apply user configuration at boot, and maybe eventually add extra initrd support for the non-autoconfig case. And that leaves me very confused when I hear that nmstate has no plans to ship such a systemd service. If nmstate is only meant to be used by applications to do reconfiguration at runtime, then we don't need to ship it in the OS. If it's meant to be useful for initial node configuration, then users will need a way to apply nmstate configs.

Since Nmstate is happy to add the systemd mechanism we are all set here right ? we are covering how the user will put initial nmstate configuration. Also I understand that the install nmstate config file to the system is already covered by normal ignition config so it does not need any integration with coreos-installer.

Since looks like we have green light here I think these are the the next steps, right ?

And in parallel review/merge the coreos-installer --nmstate-network PR.

bgilbert commented 2 years ago

Packages are added to Fedora CoreOS by consensus, so I can't say for sure, but I think /etc/nmstate support will help. We'll discuss it further at the next weekly meeting. Anyone from the nmstate team is welcome to attend.

If we do reach agreement to add nmstate, I think those are the correct next steps. You're correct that Ignition will handle writing the configs, so no additional coreos-installer integration should be necessary.

bgilbert commented 2 years ago

The nmstate team won't be available to attend this week, so let's defer this until next week's meeting. I'll leave the meeting label attached.

cathay4t commented 2 years ago

The /etc/nmstate support is on-going at https://github.com/nmstate/nmstate/pull/1936

qinqon commented 2 years ago

@bgilbert we have have collected some use cases for nmstate and FCOS to have it at the meeting.


Validation Checking the ignition rationale, the verification step from nmstate align with that since it ensure that the network configuration state asked by the user is reached or fail, so it will fail to boot with a clear message about what was not able to set up.

Some scenarios that are only verified by nmstate:

In general there are a lot of kernel restrictions that nmstate verification can find.


First startup configuration

A "service" mode for nmstatectl is going to be included adding a new subcommand to nmstatectl and a systemd unit that will run nmstatectl to apply the config from /etc/nmstate at boot. This will be done only once, so future changes at networking done by users at day 2 are not overwritten.

There is a WIP PR from Gris: https://github.com/nmstate/nmstate/pull/1936


Dynamic configuration

Running nmstatectl at live system allows us to compose the network state with current network configuration with a tool like nmpolicyctl, so it's possible to have one nmstate yaml/json for the whole cluster instead of one per node, some of the placeholders will be expanded with node's networking state.

Example of creating a bridge on top of default gw and using DHCP.

capture:
  default-gw: routes.running.destination=="0.0.0.0/0"
  base-iface: interfaces.name==capture.default-gw.routes.running.0.next-hop-interface
desiredState:
  interfaces:
  - name: br1
    description: DHCP aware Linux bridge to connect a nic that is referenced by a default gateway
    type: linux-bridge
    state: up
    mac-address: "{{ capture.base-iface.interfaces.0.mac-address }}"
    ipv4:
      dhcp: true
      enabled: true
    bridge:
      options:
        stp:
          enabled: false
        port:
        - name: "{{ capture.base-iface.interfaces.0.name }}"

Networking configuration steps for baremetal

  1. day 0: The "coreos-installer live customize --network-nmstate" would set minimal network like the ip addresses to allow the node to access the ignition URL.
  2. day 1: The ignition from the url will install a nmstate yaml/json a special place (/etc/nmstate) and the systemd using will run it, It will set up more complex networking that can be validated as they get applied to the live system needed for the running application to startup.
  3. day 2: Running application can take over the ignition nmstate configuration and add extra checks like ping to default gw, DNS resolve or application specific checks like k8s apiserver connectivity, It will also be able to modify it
travier commented 2 years ago

We've discussed this topic in today's FCOS community meeting:

We will add nmstate to Fedora CoreOS.

See the meeting notes for more details about the discussion. We should probably list the remaining steps in this card.

bgilbert commented 2 years ago

I think the next steps are:

qinqon commented 2 years ago
  • [ ] Document using nmstate to configure networking

    • How to validate an nmstate config
    • Embedding an nmstate config via coreos-installer iso/pxe customize --network-nmstate
    • Provisioning an nmstate config via Butane
    • Manually configuring with nmstate in the live system, then using --copy-network
    • Hybrid model: bootstrap nmstate during install via --network-nmstate, followed by larger nmstate in the Ignition config

I understand that after nmstate is added to FCOS there will be to nmstate configuration mechanism "offline" with --network-nmstate and "online" by copy nmstate yaml/json to /etc/nmstate, so documenting the "online" too make sense to me.

qinqon commented 1 year ago

Is there any news on this ?

qinqon commented 1 year ago

Preparing a PR to deliver it at FCOS https://github.com/coreos/fedora-coreos-config/pull/2269

travier commented 1 year ago

Should we close this one now that https://github.com/coreos/fedora-coreos-config/pull/2269 is merged?

bgilbert commented 1 year ago

No, the remaining steps are in https://github.com/coreos/fedora-coreos-tracker/issues/1175#issuecomment-1196286556.

dustymabe commented 1 year ago

The fix for this went into next stream release 38.20230322.1.0. Please try out the new release and report issues.

dustymabe commented 1 year ago

The fix for this went into testing stream release 37.20230322.2.0. Please try out the new release and report issues.

dustymabe commented 1 year ago

The fix for this went into stable stream release 37.20230322.3.0.

dustymabe commented 1 year ago

This is now in all streams of FCOS. Do we have anyone who is going to work on the remaining steps in https://github.com/coreos/fedora-coreos-tracker/issues/1175#issuecomment-1196286556 ?

cc @qinqon @bgilbert

bgilbert commented 1 year ago

@qinqon, would you be able to write up a documentation PR for using nmstate in Fedora CoreOS?

qinqon commented 1 year ago

@qinqon, would you be able to write up a documentation PR for using nmstate in Fedora CoreOS?

Sure no problem

qinqon commented 1 year ago

@bgilbert @dustymabe I have start document it here:

There is a missing feature at nmstate to do the same that is done at those examples:

Also the fcos examples looks a little strange since they are doing static IP configuration but they set dhcp-hostname.

dustymabe commented 1 year ago

We discussed this in the community meeting today.

12:48:06 dustymabe | #agreed we will document a single or a few examples with nmstate
                   | (perhaps putting them on a secondary page and linking from the
                   | main page) and leave keyfiles as the primary documented network
                   | configuration for now.