Closed champtar closed 1 month ago
Rebased on main, removed go version bump
Rebased on main @aojea do you have time to do a full review ?
Maybe @squeed you have some time to review this ?
Rebased
I would expect CNI plugins to execute on servers and not desktops. The auto-detection of new interfaces (i.e. detecting and managing them) is supposed to be disabled on servers that use NM.
Can that be an alternative solution?
I would expect CNI plugins to execute on servers and not desktops. The auto-detection of new interfaces (i.e. detecting and managing them) is supposed to be disabled on servers that use NM.
Can that be an alternative solution?
I'm already using no-auto-default=*
in NM config, but I use cockpit and want some interfaces (identified by mac pattern) not to be configurable at all (NM_UNMANAGED
), and this race condition makes it impossible
I'm already using
no-auto-default=*
in NM config, but I use cockpit and want some interfaces (identified by mac pattern) not to be configurable at all (NM_UNMANAGED
), and this race condition makes it impossible
I think the basic is setting the no-auto-default
and ignore-carrier
as set with the NetworkManager-config-server
package. But I think NM has more options in this regard.
If you already exhausted this direction with the NM experts, then I guess that is it.
The proposed solution you provided to overcome the problem is innovative, but at the same time feels a bit hacky. It attempts to overcome a problem of NM & udev, which a general CNI should not care about. I guess NM had not faced this problem before because this is not a common action, but if we think about it, the same operation could have been done by a human or automation, regardless of a CNI, and one could expect NM to still behave correctly. Solving this at the core, would solve the CNI scenario as well.
If the workaround is accepted here, then at the minimum I would suggest to have an open bug on the NM project, asking to solve it there and ref in the code. I.e. making the current solution temporary.
I'm already using
no-auto-default=*
in NM config, but I use cockpit and want some interfaces (identified by mac pattern) not to be configurable at all (NM_UNMANAGED
), and this race condition makes it impossibleI think the basic is setting the
no-auto-default
andignore-carrier
as set with theNetworkManager-config-server
package. But I think NM has more options in this regard. If you already exhausted this direction with the NM experts, then I guess that is it.The proposed solution you provided to overcome the problem is innovative, but at the same time feels a bit hacky. It attempts to overcome a problem of NM & udev, which a general CNI should not care about. I guess NM had not faced this problem before because this is not a common action, but if we think about it, the same operation could have been done by a human or automation, regardless of a CNI, and one could expect NM to still behave correctly. Solving this at the core, would solve the CNI scenario as well.
Agreed that NM should do better, but the bug is 3 weeks old and still waiting for triage. Also having the NM fix backported will take a long time.
If the workaround is accepted here, then at the minimum I would suggest to have an open bug on the NM project, asking to solve it there and ref in the code. I.e. making the current solution temporary.
My fix also leads to less udev events / rules execution, so I really don't see it as a workaround And maybe some other software handling network configuration are impacted, each distro uses its own ...
My fix also leads to less udev events / rules execution, so I really don't see it as a workaround And maybe some other software handling network configuration are impacted, each distro uses its own ...
I see it as a workaround because it fixes a problem of a different component. I.e. the CNI needs to take into account the host OS configuration and components. Fixing it at the "edge" is not scaling well and burdens this CNI logic.
The cost of changing this CNI and supporting the added logic is not easier (or faster) than fixing the root cause. NM tasks could be prioritized through D/S if this is possible from your side.
Fixing it at the "edge" is not scaling well
Depending on how you write your udev rules they might be impacted as well, for example if you do negative matches ATTR{address}!="00:11:22:*"
this will match because the device doesn't exists, so ATTR{address}
is empty
Fast rename is also problematic for udev alone, and race condition are always fun to debug
and burdens this CNI logic.
it's the same number of steps
With temp name:
With temp namespace:
Honestly, this makes sense to me. I understand that NM changes can take far too long to roll out, especially when there is no appreciable workaround.
Personally, I'm in favor of this PR. It's not like namespaces are so expensive. IIRC there have been other issues regarding "churn" as the host-network device is reconfigured, so this seems like a reasonable change.
@squeed as it make sense to you, any chance you can review this PR ?
and burdens this CNI logic.
it's the same number of steps
With temp name:
- link down
- rename 1
- change namespace
- rename 2
With temp namespace:
- create a temp namespace
- change namespace (this also down the link)
- rename
- change namespace
Ok, this argument is convincing.
This as been approved by @squeed for a week, ready for review for a good month, how can we move forward ?
@champtar we’ll be cutting a release in a week or so, this will get in.
Using a temporary name / doing a fast rename causes some race conditions with udev and NetworkManager: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1599
Fixes #1072