How to reproduce it (as minimally and precisely as possible):
Deploy network operator on RHEL 8.8 hosts with valid RHEl subscription.
Anything else we need to know?:
Tried the option to specify private repo: ofedDriver.repoConfig.name
network-operator pod log shows error:
2024-04-10T16:39:52Z ERROR Error while syncing state {"controller": "nicclusterpolicy", "controllerGroup": "mellanox.com", "controllerKind": "NicClusterPolicy", "NicClusterPolicy": {"name":"nic-cluster-policy"}, "namespace": "", "name": "nic-cluster-policy", "reconcileID": "d09bbc74-ce62-4fe4-9ccc-99838b245ed3", "error": "failed to create k8s objects from manifest: failed to get destination directory for custom repo config: distribution not supported", "errorVerbose": "failed to get destination directory for custom repo config: distribution not supported\nfailed to create k8s objects from manifest\ngithub.com/Mellanox/network-operator/pkg/state.(stateOFED).Sync\n\t/workspace/pkg/state/state_ofed.go:270\ngithub.com/Mellanox/network-operator/pkg/state.(stateManager).SyncState\n\t/workspace/pkg/state/manager.go:92\ngithub.com/Mellanox/network-operator/controllers.(NicClusterPolicyReconciler).Reconcile\n\t/workspace/controllers/nicclusterpolicy_controller.go:144\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
github.com/Mellanox/network-operator/pkg/state.(stateManager).SyncState
/workspace/pkg/state/manager.go:101
github.com/Mellanox/network-operator/controllers.(NicClusterPolicyReconciler).Reconcile
/workspace/controllers/nicclusterpolicy_controller.go:144
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235
Got the ofed driver to install successfully by patching the mofed-rhel8.8-ds daemonset and adding these volumeMonts/volumes entries:
What happened: Deploy network operator on RHEL 8.8 hosts with option ofedDriver: deploy: true
ofed driver pods fail dues to error:
What you expected to happen: ofed driver install succeeds on RHEL 8.8. Release notes for network operator v23.10.0 state that RHEL 8.8 is supported: https://docs.nvidia.com/networking/display/kubernetes2310/release+notes
How to reproduce it (as minimally and precisely as possible): Deploy network operator on RHEL 8.8 hosts with valid RHEl subscription.
Anything else we need to know?: Tried the option to specify private repo: ofedDriver.repoConfig.name network-operator pod log shows error:
Got the ofed driver to install successfully by patching the mofed-rhel8.8-ds daemonset and adding these volumeMonts/volumes entries:
Logs:
NicClusterPolicy CR spec and state:
Output of:
kubectl -n nvidia-network-operator get -A
:NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/cni-plugins-ds 4 4 4 4 4 45m
daemonset.apps/kube-multus-ds 4 4 4 4 4 45m
daemonset.apps/mofed-rhel8.8-ds 3 3 3 3 3 feature.node.kubernetes.io/pci-15b3.present=true,feature.node.kubernetes.io/system-os_release.ID=rhel,feature.node.kubernetes.io/system-os_release.VERSION_ID=8.8 45m
daemonset.apps/nic-feature-discovery-ds 4 4 4 4 4 45m
daemonset.apps/nv-ipam-node 4 4 4 4 4 45m
daemonset.apps/rdma-shared-dp-ds 3 3 3 3 3 feature.node.kubernetes.io/pci-15b3.present=true,network.nvidia.com/operator.mofed.wait=false 45m
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/network-operator 1/1 1 1 4d19h deployment.apps/nv-ipam-controller 2/2 2 2 45m
NAME DESIRED CURRENT READY AGE replicaset.apps/network-operator-5cbb6ccd74 0 0 0 4d19h replicaset.apps/network-operator-6444bc476f 1 1 1 4d15h replicaset.apps/network-operator-76b9994f84 0 0 0 4d19h replicaset.apps/nv-ipam-controller-64c89dcfd5 2 2 2 45m
nfd: enabled: false deployNodeFeatureRules: true
operator: tolerations: [] affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution:
sriovNetworkOperator: enabled: false
NicClusterPolicy CR values:
deployCR: true
nvPeerDriver: deploy: false
rdmaSharedDevicePlugin: deploy: true resources:
secondaryNetwork: deploy: true multus: deploy: true cniPlugins: deploy: true ipamPlugin: deploy: false
nvIpam: deploy: true
sriovDevicePlugin: deploy: false
ofedDriver: deploy: true repoConfig: name: repo-config env:
nicFeatureDiscovery: deploy: true