NetApp / trident

Storage orchestrator for containers
Apache License 2.0
759 stars 221 forks source link

trident-csi 23.4.0 will not startup on openshift 4.10 FIPS Enabled cluster #851

Closed shankarpentyala07 closed 3 weeks ago

shankarpentyala07 commented 1 year ago

We see the trident-csi 23.4.0 not starting up with this error messages in the log:

This issue is not seen in trident 22.1.1

remote error: tls: protocol version not supported

trident-controller-856b7f5cdb-2fcfh   6/6     Running            0                 7h7m
trident-node-linux-2h46k              1/2     CrashLoopBackOff   169 (3m31s ago)   7h57m
trident-node-linux-2xglg              1/2     CrashLoopBackOff   165 (75s ago)     7h57m
trident-node-linux-2xk2r              1/2     CrashLoopBackOff   169 (36s ago)     7h57m
trident-node-linux-6kbp2              1/2     CrashLoopBackOff   165 (95s ago)     7h57m
trident-node-linux-8ph9x              1/2     CrashLoopBackOff   152 (2m46s ago)   7h11m
trident-node-linux-b9sz5              1/2     CrashLoopBackOff   165 (55s ago)     7h57m
trident-node-linux-dpchq              1/2     CrashLoopBackOff   154 (15s ago)     7h11m
trident-node-linux-jzvf5              1/2     CrashLoopBackOff   152 (2m35s ago)   7h11m
trident-node-linux-l2xbw              1/2     CrashLoopBackOff   152 (4m40s ago)   7h11m
trident-node-linux-p6m6k              1/2     CrashLoopBackOff   155 (54s ago)     7h10m
trident-node-linux-rmnhb              1/2     CrashLoopBackOff   153 (2m2s ago)    7h11m
trident-node-linux-s96xr              1/2     CrashLoopBackOff   169 (3m42s ago)   7h57m
trident-operator-5c4d7b74b-k9lnj      1/1     Running            0                 7h7m

 oc describe pod trident-node-linux-2xglg 
Name:                 trident-node-linux-2xglg
Namespace:            trident
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 ip-10-0-137-53.ec2.internal/10.0.137.53
Start Time:           Mon, 07 Aug 2023 16:42:09 -0700
Labels:               app=node.csi.trident.netapp.io
                      controller-revision-hash=84d6798f58
                      pod-template-generation=1
Annotations:          openshift.io/scc: trident-node-linux
Status:               Running
IP:                   10.0.137.53
IPs:
  IP:           10.0.137.53
Controlled By:  DaemonSet/trident-node-linux
Containers:
  trident-main:
    Container ID:  cri-o://ee8b0d94ca91ab6678e1d4a98281ee1e0ee70ce1049682d0bc5bf07d8c8b17dd
    Image:         docker.io/netapp/trident:23.04.0
    Image ID:      docker.io/netapp/trident@sha256:62fcc206475c5b024490cb9ace504ccf29fc3d25e1f3b017518f4daba8a66c3a
    Port:          <none>
    Host Port:     <none>
    Command:
      /trident_orchestrator
    Args:
      --no_persistence
      --k8s_pod
      --rest=false
      --csi_node_name=$(KUBE_NODE_NAME)
      --csi_endpoint=$(CSI_ENDPOINT)
      --csi_role=node
      --log_format=text
      --log_level=info
      --log_workflows=
      --log_layers=
      --disable_audit_log=true
      --http_request_timeout=90s
      --https_rest
      --https_port=17546
      --enable_force_detach=false
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 08 Aug 2023 00:37:45 -0700
      Finished:     Tue, 08 Aug 2023 00:38:09 -0700
    Ready:          False
    Restart Count:  165
    Liveness:       http-get https://:17546/liveness delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:17546/readiness delay=10s timeout=1s period=10s #success=1 #failure=5
    Startup:        http-get https://:17546/liveness delay=0s timeout=1s period=5s #success=1 #failure=5
    Environment:
      KUBE_NODE_NAME:   (v1:spec.nodeName)
      KUBELET_DIR:     /var/lib/kubelet
      CSI_ENDPOINT:    unix://plugin/csi.sock
      PATH:            /netapp:/bin
    Mounts:
      /certs from certs (ro)
      /dev from dev-dir (rw)
      /host from host-dir (rw)
      /plugin from plugin-dir (rw)
      /sys from sys-dir (rw)
      /var/lib/kubelet/plugins from plugins-mount-dir (rw)
      /var/lib/kubelet/pods from pods-mount-dir (rw)
      /var/lib/trident/tracking from trident-tracking-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lvht6 (ro)
  driver-registrar:
    Container ID:  cri-o://3e9b61ed59d0a34817ffd97c0159414e3b0c5fb3f64d9600c9d1cb7cff1e2908
    Image:         registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
    Image ID:      registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=2
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(REGISTRATION_PATH)
    State:          Running
      Started:      Mon, 07 Aug 2023 16:42:14 -0700
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:            /plugin/csi.sock
      REGISTRATION_PATH:  /var/lib/kubelet/plugins/csi.trident.netapp.io/csi.sock
      KUBE_NODE_NAME:      (v1:spec.nodeName)
    Mounts:
      /plugin from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lvht6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.trident.netapp.io/
    HostPathType:  DirectoryOrCreate
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  plugins-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins
    HostPathType:  DirectoryOrCreate
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:  DirectoryOrCreate
  dev-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  Directory
  sys-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
  host-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
  trident-tracking-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/trident/tracking
    HostPathType:  DirectoryOrCreate
  certs:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          trident-csi
    SecretOptionalName:  <nil>
    SecretName:          trident-encryption-keys
    SecretOptionalName:  <nil>
  kube-api-access-lvht6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoExecute op=Exists
                             :NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                       From     Message
  ----     ------     ----                      ----     -------
  Warning  Unhealthy  37m (x768 over 7h57m)     kubelet  Startup probe failed: Get "https://10.0.137.53:17546/liveness": remote error: tls: protocol version not supported
  Warning  BackOff    7m50s (x1963 over 7h55m)  kubelet  Back-off restarting failed container
  Normal   Pulled     2m39s (x164 over 7h57m)   kubelet  Container image "docker.io/netapp/trident:23.04.0" already present on machine

oc get cm cluster-config-v1 -n kube-system -o json | jq -r '.data' | grep -i "fips"
  "install-config": "apiVersion: v1\nbaseDomain: cpdonawsonline.com\ncompute:\n- architecture: amd64\n  hyperthreading: Enabled\n  name: worker\n  platform:\n    aws:\n      rootVolume:\n        iops: 2000\n        size: 300\n        type: io1\n      type: m5.4xlarge\n      zones:\n      - us-east-1a\n  replicas: 3\ncontrolPlane:\n  architecture: amd64\n  hyperthreading: Enabled\n  name: master\n  platform:\n    aws:\n      rootVolume:\n        iops: 4000\n        size: 300\n        type: io1\n      type: m5.2xlarge\n      zones:\n      - us-east-1a\n  replicas: 3\nfips: true\nmetadata:\n  creationTimestamp: null\n  name: fsxlm\nnetworking:\n  clusterNetwork:\n  - cidr: 10.128.0.0/14\n    hostPrefix: 23\n  machineNetwork:\n  - cidr: 10.0.0.0/16\n  networkType: OpenShiftSDN\n  serviceNetwork:\n  - 172.30.0.0/16\nplatform:\n  aws:\n    region: us-east-1\n    subnets:\n    - subnet-0c50a404003991937\n    - subnet-09e88f9755d452849\npublish: External\npullSecret: \"\"\nsshKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDKnHgMqTSbs1QgT1mxv93NJ+tyG4E1EKSRIc/CjtOVZSc1W77PVmraOKtuaP+5QTm3ICjny1912kUid1Gcx59v+tHo640ivmhgOb7h3QzCVdpbln+qm6wHirJFzcvaLBlkO+hA0gfyhmX26/GN5zlZTV1OyWyajyxOR+gMq/J9G2NvDL0UvDO2e3FpD109LZQJUITgsfVhu8wlSiI8HXzd0q2eZeICE5N9dN784TMRpt8YFbtYOjwXdZ9HdPH9uCp238mgqkVVoEkZDOXyIGamu9LuB/Vg3qUS5GxErrTxbFWHkHmcEwmbCOV3Txdi6V4hP3Bz+NZnmPTJUm6YPWznYQPzpMbBF3KhOWtsCyps5glM3Dvt7/NC8z/B+kG4gDa+cjyChkBZROtNLW4RbKNGRTKyyGWxpYrbPxCRtG3SyLHpAsdv8IhkWGUEVixIvQ+w2uijo4iOUCuC1UxvsdYmmbyIaBJw/OWEsBnBlYB6tM4nEvv8pJnbFUQ38do9Iu0=\n  shankarpentyala@Shankars-MacBook-Pro.local\n"

 oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.64   True        False         12h     Cluster version is 4.10.64

Environment

Trident version: [23.4.0] Trident installation flags used: [default]

helm install trident-driver netapp-trident/trident-operator \
            --version "23.4.0" \
            --create-namespace \
            --namespace "trident"

OpenShift v4.10 FIPS enabled

rsjonte commented 6 months ago

This issue should be closed, the root cause is on the Red Hat Openshift side in that the FIPS implementation before OCP 4.14 not supported TLS1.3 correctly.

The kubelet in OCP < 4.14 only supported TLS 1.2 Trident 22.04 > and later require TLS 1.3

clintonk commented 3 weeks ago

Closing as suggested.