Mellanox / network-operator

Mellanox Network Operator
Apache License 2.0
190 stars 49 forks source link

Network Operator docs: Wrong config for deployment scenario "Network Operator Deployment with NVIDIA-IPAM" #889

Closed gseidlerhpe closed 3 months ago

gseidlerhpe commented 3 months ago

What happened: Followed documentation to deploy Network Operator Deployment with NVIDIA-IPAM: https://docs.nvidia.com/networking/display/kubernetes2310/network+operator#src-144713486_NetworkOperator-NetworkOperatorDeploymentwithNVIDIA-IPAM

NVIDIA-IPAM CRD was not deployed. Could not deploy the NV-IPAM IPPool,.

What you expected to happen: NVIDIA-IPAM CRD and CNI Plugin is deployed and configured.

How to reproduce it (as minimally and precisely as possible): Follow doc example.

Anything else we need to know?: The documented values.yaml is wrong. The entry nvIpam must not be under secondaryNetwork.

nfd:
  enabled: false
  deployNodeFeatureRules: true

operator:
  tolerations: []
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 10
          preference:
            matchExpressions:
              - key: "node-role.kubernetes.io/worker"
                operator: In
                values: [""]
        - weight: 1
          preference:
            matchExpressions:
              - key: "hpe.com/dataplatform"
                operator: NotIn
                values: ["true"]
        - weight: 1
          preference:
            matchExpressions:
              - key: "node-role.kubernetes.io/control-plane"
                operator: In
                values: [ "" ]

sriovNetworkOperator:
  enabled: false

# NicClusterPolicy CR values:
deployCR: true

nvPeerDriver:
  deploy: false

rdmaSharedDevicePlugin:
  deploy: true
  resources:
    - name: rdma_shared_device_a
      ifNames: [ens2f0, ens5f0]

secondaryNetwork:
  deploy: true
  multus:
    deploy: true
  cniPlugins:
    deploy: true
  ipamPlugin:
    deploy: false

nvIpam:
  deploy: true

sriovDevicePlugin:
  deploy: false

ofedDriver:
  deploy: true
  repoConfig:
    name: repo-config
  env:
  - name: HTTPS_PROXY
    value: http://proxy-de.its.hpecorp.net:443
  - name: HTTP_PROXY
    value: http://proxy-de.its.hpecorp.net:443
  - name: https_proxy
    value: http://proxy-de.its.hpecorp.net:443
  - name: http_proxy
    value: http://proxy-de.its.hpecorp.net:443

nicFeatureDiscovery:
  deploy: true

Logs:

Environment:

rollandf commented 3 months ago

Thanks for the report. Upstream doc is fixed: https://github.com/Mellanox/network-operator-docs/pull/32

I will make sure the fix will be done also on released versions

rollandf commented 3 months ago

Doc updated.