coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
263 stars 59 forks source link

New Package Request: mstflint #1264

Closed SchSeba closed 1 year ago

SchSeba commented 2 years ago

Please try to answer the following questions about the package you are requesting:

  1. What, if any, are the additional dependencies on the package? (i.e. does it pull in Python, Perl, etc)
  2. dnf repoquery --requires --resolve mstflint
    Updating Subscription Management repositories.
    Last metadata expiration check: 1:09:35 ago on Tue 26 Jul 2022 03:41:19 AM EDT.
    bash-0:4.4.20-3.el8.x86_64
    boost-filesystem-0:1.66.0-10.el8.x86_64
    boost-regex-0:1.66.0-10.el8.x86_64
    boost-system-0:1.66.0-10.el8.x86_64
    glibc-0:2.28-189.5.el8_6.i686
    glibc-0:2.28-189.5.el8_6.x86_64
    libcurl-0:7.61.1-22.el8_6.3.x86_64
    libcurl-minimal-0:7.61.1-22.el8_6.3.x86_64
    libgcc-0:8.5.0-10.1.el8_6.x86_64
    libstdc++-0:8.5.0-10.1.el8_6.x86_64
    libxml2-0:2.9.7-13.el8_6.1.x86_64
    openssl-libs-1:1.1.1k-6.el8_5.x86_64
    platform-python-0:3.6.8-45.el8.i686
    platform-python-0:3.6.8-45.el8.x86_64
    python36-0:3.6.8-38.module+el8.5.0+12207+5c5719bc.x86_64
    xz-libs-0:5.2.4-4.el8_6.x86_64
    zlib-0:1.2.11-18.el8_5.x86_64
  3. What is the size of the package and its dependencies?
mstflint          x86_64  4.18.0-1.el8     rhel-8-appstream-rpms-x86_64  4.3 M

Installing dependencies:

 boost-filesystem  x86_64  1.66.0-10.el8    rhel-8-appstream-rpms-x86_64   49 k

 boost-regex       x86_64  1.66.0-10.el8    rhel-8-appstream-rpms-x86_64  280 k

 boost-system      x86_64  1.66.0-10.el8    rhel-8-appstream-rpms-x86_64   18 k

 libicu            x86_64  60.3-2.el8_1     rhel-8-baseos-rpms-x86_64     8.8 M
  1. What problem are you trying to solve with this package? Or what functionality does the package provide?

There is a need to configure Mellanox Network and SmartNics cards from the Host. For example, configure the mode of the bluefield 2 dpu. The number of virtual functions for the Mellanox network cards like CX4,CX5,CX6

  1. Can the software provided by the package be run from a container? Explain why or why not.

Today the SR-IOV network operator does run this package inside a container but the request is to have the network configuration done before kubelet even starts. The request is to have virtual functions ready and configured with Vlans Bonds so for example the SDN network will be able to run on a bond interface created from the VFs (of two different PFs) The second use case is to switch the mode of the Bluefield 2 dpu card from SmartNic to a regular Mellanox CX6 network card.

  1. Can the tool(s) provided by the package be helpful in debugging container runtime issues?

No

  1. Can the tool(s) provided by the package be helpful in debugging networking issues?

Yes networking issues related to Mellanox Cards only

  1. Is it possible to layer the package onto the base OS as a day 2 operation? Explain why or why not.

No, we need to have it as day 1 to change the operation mode of the Bluefield 2 dpu for example, or to connect the SDN to a VF instead of a PF

  1. In the case of packages providing services and binaries, can the packaging be adjusted to just deliver binaries?

The package Doesn't contain any services

  1. Can the tool(s) provided by the package be used to do things we’d rather users not be able to do in FCOS? (e.g. can it be abused as a Turing complete interpreter?)

Not that I am aware of.

  1. Does the software provided by the package have a history of CVEs?

Not that I am aware of.

lucab commented 2 years ago

Thanks for the report. Two quick comments from my side:

Thus, if the main topic here is "performing host network configuration (before Kubelet starts)", IMHO it would be better to start discussing your usecases with the NetworkManager (and nmstate) folks to make sure it can be covered there.

SchSeba commented 2 years ago

Hi @lucab thanks for the comment!

let me try to answer, when I say network configuration it's not something related to NetworkManager. Yes network manager will configure the IP address and other stuff on the virtual functions but first, you need to create them. to do that on Mellanox cards you need to run something like this

mstconfig -d 0000:3b:00.0 -y set SRIOV_EN=True NUM_OF_VFS=20
Device #1:
----------

Device type:    ConnectX5       
Name:           0V5DG9_0TDNNT_Ax
Description:    ConnectX-5 EN network interface card; 25GbE Dual-port SFP28; SOCKET DIRECT ; PCIe3.0 2X8
Device:         0000:3b:00.0    

Configurations:                              Next Boot       New
         SRIOV_EN                            False(0)        True(1)         
         NUM_OF_VFS                          0               20              

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

After the reboot we are able to use the regular sysfs to create the 20 virtual functions so NetworkManager will be able to use them for other network configurations.

The reason I need this package is that without it there is no way to create virtual functions for Mellanox cards.

cgwalters commented 2 years ago

but the request is to have the network configuration done before kubelet even starts.

In OCP, we already pull and run containers via podman before running kubelet. For example, https://github.com/openshift/machine-config-operator/blob/da6494c26c643826f44fbc005f26e0dfd10513ae/templates/common/_base/units/nodeip-configuration.service.yaml#L18

I see no reason this wouldn't work here too.

SchSeba commented 2 years ago

Hi @cgwalters :)

That is right we can use podman to run containers before kubelet and my team is also doing that to load out of tree drivers.

But there are some problems with that solution that makes it hard to use a container in this case:

  1. the Sr-iov network operator project is an u/s project so we are not sure the nodes will contain podman, they may run docker or containerd only so making it generic is hard.
  2. another issue we can have is in disconnected environments where the customer will need to mirror this image into some internal registry. The file you point out is using a container from the core OCP payload no?
  3. the last point is that we will like to expose the option to change the bluefield 2 dpu mode in installation time and not after the node is already a valid node in the cluster (To help to reduce the number of reboots every deployment required)

I hope this explains better the situation and the reason why we will like to have the binary on the host and not use it inside a container

cgwalters commented 2 years ago

the Sr-iov network operator project is an u/s project so we are not sure the nodes will contain podman, they may run docker or containerd only so making it generic is hard.

OK, but is it easier to take a hard dependency on this tool being on the host?

another issue we can have is in disconnected environments where the customer will need to mirror this image into some internal registry. The file you point out is using a container from the core OCP payload no?

Sure, but you're also proposing shipping this in a container in the core payload (the host system is a container too!). I'm just saying we can use a different container.

the last point is that we will like to expose the option to change the bluefield 2 dpu mode in installation time and not after the node is already a valid node in the cluster (To help to reduce the number of reboots every deployment required)

As already covered, this container would run at "installation time" i.e. the firstboot before kubelet starts. See https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md#applying-os-updates-before-kubelet

cgwalters commented 2 years ago

To flesh this out slightly, right now (i.e. today) you can:

1) Install mstflint in the sr-iov operator container 2) Create a MachineConfig fragment which renders a systemd unit which pulls and runs this container image (with --privileged) via podman and uses Before=kubelet.service, and applies the configuration 3) Describe how to customers to add this MachineConfig "day 0" via additional manifests passed to openshift-install

dustymabe commented 2 years ago

We discussed this at our community meeting today.

This issue is still new so no decisions were made but the discussion did yield some patterns. Keep in mind this discussion was had in the context of Fedora CoreOS specifically:

Arguments for adding:

Arguments against adding:

We will discuss more in the coming meeting(s).

lucab commented 2 years ago

@SchSeba do you have further feedback on this? Could you maybe be around in one of our next meetings.

We clearly see that there is a least a package split needed, in order to avoid the Python dependency. But even assuming that gets done in a short timeframe, there are still some doubts on where we want to go with this.

SchSeba commented 1 year ago

we will continue to use the mstflint package from inside a container thanks for the help!