c0c0n3 / teadal.proto

Messing around with cloud infra for https://www.teadal.eu.
MIT License
4 stars 1 forks source link

DirectPV can't bind volumes #4

Open c0c0n3 opened 1 year ago

c0c0n3 commented 1 year ago

Can't get DirectPV 3.2.2 to work with K8s 1.25.4 or 1.26.0. (DirectPV doesn't yet support K8s 1.26.0, but for the record, I had the same problem.) It looks like DirectPV can't bind volumes or pods should specify some kind of node affinity to help K8s schedule.

Steps to reproduce

  1. Do a fresh install of our Qemu dev VM and start the VM.
  2. Install DirectPV: kubectl directpv install --image 'directpv:v3.2.2'
  3. Format the two empty volumes: kubectl directpv drives format --drives /dev/sda2,/dev/sda3 --nodes devm
  4. kubectl apply -f the MinIO example DirectPV provide: https://github.com/minio/directpv/blob/master/minio.yaml
  5. K8s can't schedule the minio-0 b/c it can't find any available persistent volumes to bind.
c0c0n3 commented 1 year ago

Here's K8s output to help debug.

Pod stuck in Pending state

$ kubectl get pod
NAME      READY   STATUS    RESTARTS   AGE
minio-0   0/1     Pending   0          23s

Pod description

$ kubectl describe pod minio-0
Name:             minio-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=minio
                  controller-revision-hash=minio-bc445784d
                  direct.csi.min.io/app=minio-example
                  direct.csi.min.io/organization=minio
                  direct.csi.min.io/tenant=tenant-1
                  statefulset.kubernetes.io/pod-name=minio-0
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/minio
Containers:
  minio:
    Image:      minio/minio
    Port:       <none>
    Host Port:  <none>
    Args:
      server
      http://minio-{0...3}.minio.default.svc.cluster.local:9000/data{1...4}
    Environment:
      MINIO_ACCESS_KEY:  minio
      MINIO_SECRET_KEY:  minio123
    Mounts:
      /data1 from minio-data-1 (rw)
      /data2 from minio-data-2 (rw)
      /data3 from minio-data-3 (rw)
      /data4 from minio-data-4 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xk5b6 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  minio-data-2:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  minio-data-2-minio-0
    ReadOnly:   false
  minio-data-3:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  minio-data-3-minio-0
    ReadOnly:   false
  minio-data-4:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  minio-data-4-minio-0
    ReadOnly:   false
  minio-data-1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  minio-data-1-minio-0
    ReadOnly:   false
  kube-api-access-xk5b6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3m52s  default-scheduler  0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

DirectPV logs

$ kubectl -n direct-csi-min-io get pod
NAME                                 READY   STATUS    RESTARTS   AGE
direct-csi-min-io-7f7c58fc49-k7rc5   2/2     Running   0          16m
direct-csi-min-io-7f7c58fc49-plml9   2/2     Running   0          16m
direct-csi-min-io-7f7c58fc49-stnzs   2/2     Running   0          16m
direct-csi-min-io-srcp7              4/4     Running   0          16m
$ kubectl -n direct-csi-min-io logs direct-csi-min-io-7f7c58fc49-k7rc5 -c csi-provisioner
W0205 09:08:59.652719       1 feature_gate.go:235] Setting GA feature gate Topology=true. It will be removed in a future release.
I0205 09:08:59.655208       1 feature_gate.go:243] feature gates: &{map[Topology:true]}
I0205 09:08:59.655437       1 csi-provisioner.go:138] Version: v2.2.0-go1.18
I0205 09:08:59.655497       1 csi-provisioner.go:161] Building kube configs for running in cluster...
W0205 09:09:09.666305       1 connection.go:172] Still connecting to unix:///csi/csi.sock
I0205 09:09:10.489029       1 common.go:111] Probing CSI driver for readiness
I0205 09:09:10.497342       1 csi-provisioner.go:212] Detected CSI driver direct-csi-min-io
I0205 09:09:10.500965       1 csi-provisioner.go:281] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I0205 09:09:10.511856       1 controller.go:756] Using saving PVs to API server in background
I0205 09:09:10.513917       1 leaderelection.go:243] attempting to acquire leader lease direct-csi-min-io/direct-csi-min-io...
I0205 09:09:10.527376       1 leaderelection.go:253] successfully acquired lease direct-csi-min-io/direct-csi-min-io
I0205 09:09:10.527708       1 leader_election.go:205] became leader, starting
I0205 09:09:10.527888       1 leader_election.go:212] new leader detected, current leader: direct-csi-min-io-7f7c58fc49-k7rc5
I0205 09:09:10.528495       1 reflector.go:219] Starting reflector *v1.CSINode (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.529994       1 reflector.go:255] Listing and watching *v1.CSINode from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.530495       1 reflector.go:219] Starting reflector *v1.StorageClass (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.537730       1 reflector.go:255] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.536963       1 reflector.go:219] Starting reflector *v1.Node (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.537913       1 reflector.go:255] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.537270       1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (15m0s) from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.538619       1 reflector.go:255] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:134
I0205 09:09:10.628641       1 controller.go:835] Starting provisioner controller direct-csi-min-io_direct-csi-min-io-7f7c58fc49-k7rc5_7fc029b2-16de-4e50-83a0-4623a6fa820b!
I0205 09:09:10.628729       1 volume_store.go:97] Starting save volume queue
I0205 09:09:10.628878       1 reflector.go:219] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:869
I0205 09:09:10.628886       1 reflector.go:255] Listing and watching *v1.PersistentVolume from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:869
I0205 09:09:10.629161       1 reflector.go:219] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:872
I0205 09:09:10.629168       1 reflector.go:255] Listing and watching *v1.StorageClass from sigs.k8s.io/sig-storage-lib-external-provisioner/v6/controller/controller.go:872
I0205 09:09:10.729318       1 controller.go:884] Started provisioner controller direct-csi-min-io_direct-csi-min-io-7f7c58fc49-k7rc5_7fc029b2-16de-4e50-83a0-4623a6fa820b!
$ kubectl -n direct-csi-min-io logs direct-csi-min-io-7f7c58fc49-k7rc5 -c direct-csi
I0205 09:09:11.109852       1 init.go:112] obtained client config successfully
I0205 09:09:11.198508       1 run.go:105] identity server started
I0205 09:09:11.198673       1 run.go:162] controller manager started
I0205 09:09:11.198901       1 ready.go:46] Starting to serve readiness endpoint in port: 30443
E0205 09:09:11.198723       1 admission_controller.go:40] Filed to load key pair: open /etc/admission/certs/cert.pem: no such file or directory
I0205 09:09:11.199225       1 admission_controller.go:64] Starting admission webhook server in port: 20443
c0c0n3 commented 1 year ago

So it could be a certs issue that stops DirectPV's admission controller:

E0205 09:09:11.198723       1 admission_controller.go:40] Filed to load key pair: open /etc/admission/certs/cert.pem: no such file or directory
c0c0n3 commented 1 year ago

PR #6 upgraded DirectPV to version 4.0.6 but the problem is still there, though the cause seems to be totally different. I opened an issue about it

That's for the x86/Ubuntu/MicroK8s setup. With the QEMU aarch64 NixOS VM I din't even get that far---see #8 about it. Haven't tested what happens on the QEMU x86-64 NixOS VM.