ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

Failed to mount PV due to missing namespace in SecretRef #4809

Closed xcompass closed 1 month ago

xcompass commented 1 month ago

PV created by in-tree plugin that is migrated to CSI is missing namespace field in SecretRef. This cause the kubelet not able to mount the volume with the following error message:

Events:
  Type     Reason       Age                  From     Message
  ----     ------       ----                 ----     -------
  Warning  FailedMount  7m4s (x271 over 9h)  kubelet  MountVolume.MountDevice failed for volume "pvc-8a5e6943-b576-11e8-98c1-005056012121" : fetching NodeStageSecretRef /ceph-secret-user failed: kubernetes.io/csi: failed to find the secret ceph-secret-user in the namespace  with error: an empty namespace may not be set when a resource name is provided
  Warning  FailedMount  61s (x239 over 9h)   kubelet  Unable to attach or mount volumes: unmounted volumes=[srv], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition

Note in the error message, there is an additional space after failed to find the secret ceph-secret-user in the namespace.

Here is the PV

Name:            pvc-8a5e6943-b576-11e8-98c1-005056012121
Labels:          <none>
Annotations:     kubernetes.io/createdby: rbd-dynamic-provisioner
                 pv.kubernetes.io/bound-by-controller: yes
                 pv.kubernetes.io/migrated-to: rbd.csi.ceph.com
                 pv.kubernetes.io/provisioned-by: kubernetes.io/rbd
Finalizers:      [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer]
StorageClass:    fast
Status:          Bound
Claim:           default/ltbot-stg-ltbot
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        2Gi
Node Affinity:   <none>
Message:
Source:
    Type:          RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
    CephMonitors:  [10.93.1.100:6789]
    RBDImage:      kubernetes-dynamic-pvc-8bd348e3-b576-11e8-8379-00505601176e
    FSType:
    RBDPool:       rbd
    RadosUser:     kube
    Keyring:       /etc/ceph/keyring
    SecretRef:     &SecretReference{Name:ceph-secret-user,Namespace:,}
    ReadOnly:      false
Events:            <none>

Environment details

Steps to reproduce

  1. create an PVC with in-tree plugin rbd (kubernetes.io/rbd) and create a pod to mount the PVC
  2. install Ceph-CSI-RBD and migrate the PV to CSI with CSIMigrationRBD feature gate
  3. disable in-tree plugin rbd with IntreePluginRBDUnregister feature gate
  4. delete the pod

Actual results

Pod is unable to mount the PV with above error.

Expected behavior

Pod is able to mount PV.

Logs

Here are the logs for csi-rbdplugin and driver-registrar. No additional log entries were printed when the error occurred.

k logs -n ceph-csi-rbd -f ceph-csi-rbd-nodeplugin-ghm2z -c csi-rbdplugin

I0827 09:03:29.796275 2525636 cephcsi.go:196] Driver version: v3.12.1 and Git version: 02a79943cb492130d3637673a5acc3625f6019e5
I0827 09:03:29.796734 2525636 cephcsi.go:274] Initial PID limit is set to 256123
I0827 09:03:29.796897 2525636 cephcsi.go:280] Reconfigured PID limit to -1 (max)
I0827 09:03:29.797003 2525636 cephcsi.go:228] Starting driver type: rbd with name: rbd.csi.ceph.com
I0827 09:03:29.818462 2525636 mount_linux.go:282] Detected umount with safe 'not mounted' behavior
I0827 09:03:29.818775 2525636 rbd_attach.go:243] nbd module loaded
I0827 09:03:29.818878 2525636 rbd_attach.go:257] kernel version "6.6.43-flatcar" supports cookie feature
I0827 09:03:29.856378 2525636 rbd_attach.go:273] rbd-nbd tool supports cookie feature
I0827 09:03:29.857123 2525636 server.go:114] listening for CSI-Addons requests on address: &net.UnixAddr{Name:"/csi/csi-addons.sock", Net:"unix"}
I0827 09:03:29.857273 2525636 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0827 09:03:29.941512 2525636 utils.go:240] ID: 1 GRPC call: /csi.v1.Identity/GetPluginInfo
I0827 09:03:29.943235 2525636 utils.go:241] ID: 1 GRPC request: {}
I0827 09:03:29.943269 2525636 identityserver-default.go:40] ID: 1 Using default GetPluginInfo
I0827 09:03:29.943338 2525636 utils.go:247] ID: 1 GRPC response: {"name":"rbd.csi.ceph.com","vendor_version":"v3.12.1"}
I0827 09:03:30.472752 2525636 utils.go:240] ID: 2 GRPC call: /csi.v1.Node/NodeGetInfo
I0827 09:03:30.472832 2525636 utils.go:241] ID: 2 GRPC request: {}
I0827 09:03:30.472847 2525636 nodeserver-default.go:45] ID: 2 Using default NodeGetInfo
I0827 09:03:30.472989 2525636 utils.go:247] ID: 2 GRPC response: {"accessible_topology":{},"node_id":"f6.workers.ctlt.ubc.ca"}

k logs -n ceph-csi-rbd -f ceph-csi-rbd-nodeplugin-ghm2z -c driver-registrar

I0827 09:03:29.935621 2525682 main.go:150] "Version" version="v2.11.1"
I0827 09:03:29.935709 2525682 main.go:151] "Running node-driver-registrar" mode=""
I0827 09:03:29.935717 2525682 main.go:172] "Attempting to open a gRPC connection" csiAddress="/csi/csi.sock"
I0827 09:03:29.935735 2525682 connection.go:234] "Connecting" address="unix:///csi/csi.sock"
I0827 09:03:29.936619 2525682 main.go:180] "Calling CSI driver to discover driver name"
I0827 09:03:29.936653 2525682 connection.go:264] "GRPC call" method="/csi.v1.Identity/GetPluginInfo" request="{}"
I0827 09:03:29.943779 2525682 connection.go:270] "GRPC response" response="{\"name\":\"rbd.csi.ceph.com\",\"vendor_version\":\"v3.12.1\"}" err=null
I0827 09:03:29.943813 2525682 main.go:189] "CSI driver name" csiDriverName="rbd.csi.ceph.com"
I0827 09:03:29.943877 2525682 node_register.go:56] "Starting Registration Server" socketPath="/registration/rbd.csi.ceph.com-reg.sock"
I0827 09:03:29.944228 2525682 node_register.go:66] "Registration Server started" socketPath="/registration/rbd.csi.ceph.com-reg.sock"
I0827 09:03:29.944326 2525682 node_register.go:96] "Skipping HTTP server"
I0827 09:03:30.471583 2525682 main.go:96] "Received GetInfo call" request="&InfoRequest{}"
I0827 09:03:30.503140 2525682 main.go:108] "Received NotifyRegistrationStatus call" status="&RegistrationStatus{PluginRegistered:true,Error:,}"
Madhu-1 commented 1 month ago

@xcompass I dont think this is an issue with Ceph-CSI, As you mentioned the error you are getting is from kubelet.

xcompass commented 1 month ago

Thanks @Madhu-1. I understand the error was thrown by kubelet from this line. But isn't the SecretRef field is extract by Ceph CSI from PV and calls above function to retrieve the secret? I wonder if the namespace should be set to default when missing.

Madhu-1 commented 1 month ago

Thanks @Madhu-1. I understand the error was thrown by kubelet from this line. But isn't the SecretRef field is extract by Ceph CSI from PV and calls above function to retrieve the secret? I wonder if the namespace should be set to default when missing.

@xcompass cephcsi doesn't read the secret, its read by kubernetes/kubelet and the content of that secret is passed to the cephcsi.

xcompass commented 1 month ago

Thanks. I'll close this issue. I got a workaround to patch the PVs to add namespace fields.

For anyone running into this issue, here is my script to patch the PVs

for i in `cat pv.list`;do export PV=$i; etcdctl get /registry/persistentvolumes/${PV} | ./auger decode > pv-${PV}.yaml;sed -i.backup 's/name: ceph-secret-user/name: ceph-secret-user\n      namespace: default/' pv-${PV}.yaml; cat pv-${PV}.yaml | ./auger encode | etcdctl put /registry/persistentvolumes/${PV}; done

Please change the sed search condition according to your PVs. And auger can be downloaded here: https://github.com/jpbetz/auger and compile yourself.