Closed SchSeba closed 1 year ago
Thanks for the report. Two quick comments from my side:
Thus, if the main topic here is "performing host network configuration (before Kubelet starts)", IMHO it would be better to start discussing your usecases with the NetworkManager (and nmstate) folks to make sure it can be covered there.
Hi @lucab thanks for the comment!
let me try to answer, when I say network configuration it's not something related to NetworkManager. Yes network manager will configure the IP address and other stuff on the virtual functions but first, you need to create them. to do that on Mellanox cards you need to run something like this
mstconfig -d 0000:3b:00.0 -y set SRIOV_EN=True NUM_OF_VFS=20
Device #1:
----------
Device type: ConnectX5
Name: 0V5DG9_0TDNNT_Ax
Description: ConnectX-5 EN network interface card; 25GbE Dual-port SFP28; SOCKET DIRECT ; PCIe3.0 2X8
Device: 0000:3b:00.0
Configurations: Next Boot New
SRIOV_EN False(0) True(1)
NUM_OF_VFS 0 20
Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.
After the reboot we are able to use the regular sysfs to create the 20 virtual functions so NetworkManager will be able to use them for other network configurations.
The reason I need this package is that without it there is no way to create virtual functions for Mellanox cards.
but the request is to have the network configuration done before kubelet even starts.
In OCP, we already pull and run containers via podman
before running kubelet. For example,
https://github.com/openshift/machine-config-operator/blob/da6494c26c643826f44fbc005f26e0dfd10513ae/templates/common/_base/units/nodeip-configuration.service.yaml#L18
I see no reason this wouldn't work here too.
Hi @cgwalters :)
That is right we can use podman to run containers before kubelet and my team is also doing that to load out of tree drivers.
But there are some problems with that solution that makes it hard to use a container in this case:
I hope this explains better the situation and the reason why we will like to have the binary on the host and not use it inside a container
the Sr-iov network operator project is an u/s project so we are not sure the nodes will contain podman, they may run docker or containerd only so making it generic is hard.
OK, but is it easier to take a hard dependency on this tool being on the host?
another issue we can have is in disconnected environments where the customer will need to mirror this image into some internal registry. The file you point out is using a container from the core OCP payload no?
Sure, but you're also proposing shipping this in a container in the core payload (the host system is a container too!). I'm just saying we can use a different container.
the last point is that we will like to expose the option to change the bluefield 2 dpu mode in installation time and not after the node is already a valid node in the cluster (To help to reduce the number of reboots every deployment required)
As already covered, this container would run at "installation time" i.e. the firstboot before kubelet starts. See https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md#applying-os-updates-before-kubelet
To flesh this out slightly, right now (i.e. today) you can:
1) Install mstflint in the sr-iov operator container
2) Create a MachineConfig fragment which renders a systemd unit which pulls and runs this container image (with --privileged
) via podman
and uses Before=kubelet.service
, and applies the configuration
3) Describe how to customers to add this MachineConfig "day 0" via additional manifests passed to openshift-install
We discussed this at our community meeting today.
This issue is still new so no decisions were made but the discussion did yield some patterns. Keep in mind this discussion was had in the context of Fedora CoreOS specifically:
Arguments for adding:
Arguments against adding:
We will discuss more in the coming meeting(s).
@SchSeba do you have further feedback on this? Could you maybe be around in one of our next meetings.
We clearly see that there is a least a package split needed, in order to avoid the Python dependency. But even assuming that gets done in a short timeframe, there are still some doubts on where we want to go with this.
we will continue to use the mstflint package from inside a container thanks for the help!
Please try to answer the following questions about the package you are requesting:
There is a need to configure Mellanox Network and SmartNics cards from the Host. For example, configure the mode of the bluefield 2 dpu. The number of virtual functions for the Mellanox network cards like CX4,CX5,CX6
Today the SR-IOV network operator does run this package inside a container but the request is to have the network configuration done before kubelet even starts. The request is to have virtual functions ready and configured with Vlans Bonds so for example the SDN network will be able to run on a bond interface created from the VFs (of two different PFs) The second use case is to switch the mode of the Bluefield 2 dpu card from SmartNic to a regular Mellanox CX6 network card.
No
Yes networking issues related to Mellanox Cards only
No, we need to have it as day 1 to change the operation mode of the Bluefield 2 dpu for example, or to connect the SDN to a VF instead of a PF
The package Doesn't contain any services
Not that I am aware of.
Not that I am aware of.