ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 536 forks source link

Unable to create CephFS subvolume dynamically (`no available topology found`) #4680

Closed henyxia closed 3 months ago

henyxia commented 3 months ago

Describe the bug

After having cephfs setup through the ceph-csi-cephfs helm chart, I'm unable to create a CephFS volume dynamically. Using static allocation works perfectly but dynamic throw the following error: no available topology found.

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. Deploy ceph-csi-cephfs helm chart using the values provided in additional context
  2. Deployment a StatefulSet using the volumeClaimTemplate provided in additional context
  3. See error

Actual results

The PersistentVolumeClaim is stucked in Pending. The provisioner is raising issue about topology while domainlabels and enable-read-affinity are not set.

Expected behavior

It is expected to have a PersistentVolumeClaim in Bound status.

Logs

The issue is about PVC creation, deletion, cloning please attach complete logs of below containers. Please find attached csi-provisioner log. Provisioner keeps throwing the following issues.

I0616 07:39:33.698890       1 event.go:389] "Event occurred" object="syncthing/storage-syncthing-0" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"syncthing/storage-syncthing-0\""
E0616 07:39:33.698959       1 controller.go:974] error syncing claim "b4ddac14-2411-4665-83b4-d93c182e80aa": failed to provision volume with StorageClass "cephfs": error generating accessibility requirements: no available topology found
I0616 07:39:33.699039       1 event.go:389] "Event occurred" object="syncthing/storage-syncthing-0" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"cephfs\": error generating accessibility requirements: no available topology found"
W0616 07:44:33.699578       1 topology.go:319] No topology keys found on any node

Additional context

Helm chart value file ``` --- rbac: create: true serviceAccounts: nodeplugin: create: true name: provisioner: create: true name: csiConfig: - clusterID: "XXX" monitors: - "XXX" commonLabels: {} logLevel: 5 sidecarLogLevel: 1 CSIDriver: fsGroupPolicy: "File" seLinuxMount: false nodeplugin: name: nodeplugin updateStrategy: RollingUpdate priorityClassName: system-node-critical httpMetrics: enabled: true containerPort: 8081 service: enabled: true servicePort: 8080 type: ClusterIP annotations: {} clusterIP: "" externalIPs: [] loadBalancerIP: "" loadBalancerSourceRanges: [] imagePullSecrets: [] profiling: enabled: false registrar: image: repository: registry.k8s.io/sig-storage/csi-node-driver-registrar tag: v2.10.1 pullPolicy: IfNotPresent resources: {} plugin: image: repository: quay.io/cephcsi/cephcsi tag: v3.11-canary pullPolicy: IfNotPresent resources: {} nodeSelector: {} tolerations: [] affinity: {} kernelmountoptions: "" fusemountoptions: "" provisioner: name: provisioner replicaCount: 2 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 50% timeout: 60s priorityClassName: system-cluster-critical enableHostNetwork: false httpMetrics: enabled: true containerPort: 8081 service: enabled: true servicePort: 8080 type: ClusterIP annotations: {} clusterIP: "" externalIPs: [] loadBalancerIP: "" loadBalancerSourceRanges: [] imagePullSecrets: [] profiling: enabled: false provisioner: image: repository: registry.k8s.io/sig-storage/csi-provisioner tag: v5.0.1 pullPolicy: IfNotPresent resources: {} extraArgs: [] setmetadata: true resizer: name: resizer enabled: true image: repository: registry.k8s.io/sig-storage/csi-resizer tag: v1.11.1 pullPolicy: IfNotPresent resources: {} extraArgs: [] snapshotter: image: repository: registry.k8s.io/sig-storage/csi-snapshotter tag: v8.0.1 pullPolicy: IfNotPresent resources: {} extraArgs: [] args: enableVolumeGroupSnapshots: false nodeSelector: {} tolerations: [] affinity: {} selinuxMount: false storageClass: create: true name: cephfs annotations: {} clusterID: "XXX" fsName: cephfs pool: "" fuseMountOptions: "" kernelMountOptions: "" mounter: "" volumeNamePrefix: "" provisionerSecret: csi-cephfs-secret provisionerSecretNamespace: "" controllerExpandSecret: csi-cephfs-secret controllerExpandSecretNamespace: "" nodeStageSecret: csi-cephfs-secret nodeStageSecretNamespace: "" reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: [] secret: create: false name: csi-cephfs-secret annotations: {} cephconf: | [global] auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx fuse_big_writes = true extraDeploy: [] provisionerSocketFile: csi-provisioner.sock pluginSocketFile: csi.sock kubeletDir: /var/lib/kubelet driverName: cephfs.csi.ceph.com configMapName: ceph-csi-config externallyManagedConfigmap: false cephConfConfigMapName: ceph-config ```
volumeClaimTemplate ``` - metadata: name: storage spec: accessModes: - ReadWriteOnce volumeMode: Filesystem resources: requests: storage: 500Gi storageClassName: cephfs ```
iPraveenParihar commented 3 months ago

@henyxia

If by create CephFS subvolume dynamically you mean topology-based provisioning of volumes, please note that Ceph-CSI currently does not support topology-based provisioning for the CephFS driver.

There have been recent changes in the csi-provisioner related to the Topology feature, as highlighted in the changelog for v5.0.1. If you're using ceph-csi:v3.11, it is compatible with csi-provisioner v4.

We have updated csi-provisioner to v5 in https://github.com/ceph/ceph-csi/pull/4660, and it will be part of the next release. If you still need to use csi-provisioner v5, you will need to add --feature-gates=Topology=false to the csi-provisioner configuration here https://github.com/ceph/ceph-csi/blob/202f43c82d63d37bd765dea7ecf8de4037eca7b6/charts/ceph-csi-cephfs/templates/provisioner-deployment.yaml#L123-L131

henyxia commented 3 months ago

Hi @iPraveenParihar !

Indeed, I totally missed this breaking change in the changelog. Adding the feature flag option to disable topology indeed resolved the issue.

Thank you