galaxyproject / galaxy-helm

Minimal setup required to run Galaxy under Kubernetes
MIT License
38 stars 36 forks source link

INSTALLATION FAILED on debian11 due to unbound immediate PersistentVolumeClaims #433

Open Milokita opened 1 year ago

Milokita commented 1 year ago

When tried a clean install the galaxy using the chart from both the GitHub repo and the packaged chart repo, the helm would report:

Error: INSTALLATION FAILED: failed post-install: 1 error occurred:
        * timed out waiting for the condition

some pods have started but majorities are pending

NAME                                                              READY   STATUS    RESTARTS      AGE
galaxy-my-galaxy-release-postgres-0                               0/1     Pending   0             25m
my-galaxy-release-celery-5d4fb95f6f-ns6sq                         0/1     Pending   0             10m
my-galaxy-release-celery-beat-6447c87694-h5f85                    0/1     Pending   0             10m
my-galaxy-release-cvmfscsi-controllerplugin-5b9489b479-jbjwm      2/2     Running   0             10m
my-galaxy-release-cvmfscsi-nodeplugin-6l7xt                       0/2     Pending   0             10m
my-galaxy-release-cvmfscsi-nodeplugin-7j7mv                       0/2     Pending   0             10m
my-galaxy-release-cvmfscsi-nodeplugin-ffj8x                       0/2     Pending   0             10m
my-galaxy-release-init-db-sfm09-d862g                             0/1     Pending   0             10m
my-galaxy-release-init-mounts-akdq2-xtxzx                         0/4     Pending   0             10m
my-galaxy-release-job-0-6664c5c674-zfdkl                          0/1     Pending   0             10m
my-galaxy-release-nginx-787bbc9755-46kd2                          0/1     Pending   0             10m
my-galaxy-release-post-install-cvmfs-fix-job-64vc8                1/1     Running   0             10m
my-galaxy-release-postgres-85b8bb574b-lv99b                       1/1     Running   0             10m
my-galaxy-release-rabbitmq-7c7b66f498-mxvwt                       0/1     Running   4 (73s ago)   10m
my-galaxy-release-rabbitmq-messaging-topology-operator-79ctp5sd   1/1     Running   0             10m
my-galaxy-release-rabbitmq-server-server-0                        0/1     Pending   0             10m
my-galaxy-release-tusd-7946c97b9f-vz9z9                           0/1     Pending   0             10m
my-galaxy-release-web-5cd46c649d-rkv7d                            0/1     Pending   0             10m
my-galaxy-release-workflow-5c789ddbd8-8jffq                       0/1     Pending   0             10m

upon inspection the pending pods such as my-galaxy-release-celery-5d4fb95f6f-ns6sq it would report

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  2m34s (x6 over 13m)  default-scheduler  0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod..
nuwang commented 1 year ago

Please query the namespace to check which persistent claims are not bound. Most likely, you will need to configure a storage class that supports ReadWriteMany, such as nfs, if you're in a multi-node cluster. See here: https://github.com/galaxyproject/galaxy-helm/tree/master#data-persistence

Let us know if you need more info.

Milokita commented 1 year ago

Please query the namespace to check which persistent claims are not bound. Most likely, you will need to configure a storage class that supports ReadWriteMany, such as nfs, if you're in a multi-node cluster. See here: https://github.com/galaxyproject/galaxy-helm/tree/master#data-persistence

Let us know if you need more info.

Thank you for your reply, I had a NFS server, so I tried to create PV using this tutorial https://itnext.io/kubernetes-storage-part-1-nfs-complete-tutorial-75e6ac2a1f77 the created PV is as follows

Name:            nfs-volume
Labels:          storage.k8s.io/created-by=ssbostan
                 storage.k8s.io/name=nfs
                 storage.k8s.io/part-of=kubernetes-complete-reference
Annotations:     pv.kubernetes.io/bound-by-controller: yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:
Status:          Bound
Claim:           default/persistence-my-galaxy-rabbitmq-server-server-0
Reclaim Policy:  Recycle
Access Modes:    RWO,ROX,RWX
VolumeMode:      Filesystem
Capacity:        200Gi
Node Affinity:   <none>
Message:
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    192.168.1.40
    Path:      /nfs
    ReadOnly:  false
Events:        <none>

next, after uninstalled old galaxy using helm I tried to init the galaxy again using helm install my-galaxy . on the docker repo

this time the pods status is a little different

NAME                                                              READY   STATUS                       RESTARTS       AGE
galaxy-my-galaxy-postgres-0                                       0/1     Pending                      0              12m
galaxy-my-galaxy-release-postgres-0                               0/1     Pending                      0              12m
my-galaxy-celery-79c77b8ccb-cvfcr                                 0/1     Init:0/1                     0              6m6s
my-galaxy-celery-beat-df6996755-n7f2x                             0/1     Init:0/1                     0              6m6s
my-galaxy-cvmfscsi-controllerplugin-647ccf5b79-znxst              2/2     Running                      0              6m6s
my-galaxy-cvmfscsi-nodeplugin-6wh46                               0/2     Pending                      0              6m6s
my-galaxy-cvmfscsi-nodeplugin-j5nmp                               0/2     Pending                      0              6m6s
my-galaxy-cvmfscsi-nodeplugin-wchcq                               0/2     Pending                      0              6m6s
my-galaxy-init-db-ri4nb-jb7dx                                     0/1     Init:0/1                     0              6m6s
my-galaxy-init-mounts-ef1xx-mnh5n                                 0/4     ContainerCreating            0              6m6s
my-galaxy-job-0-c9dc6c54c-6bnx2                                   0/1     Init:0/1                     0              6m6s
my-galaxy-nginx-5f6f88ffcc-29rhv                                  1/1     Running                      5 (102s ago)   6m6s
my-galaxy-post-install-cvmfs-fix-job-qgs4n                        1/1     Running                      0              6m6s
my-galaxy-postgres-6cb57c64fd-55hjs                               1/1     Running                      0              6m6s
my-galaxy-rabbitmq-5879ccf4f4-vxhkp                               1/1     Running                      0              6m6s
my-galaxy-rabbitmq-messaging-topology-operator-865fb65bb8-546mp   1/1     Running                      0              6m6s
my-galaxy-tusd-7ccfc4fc6c-r9x9b                                   0/1     CreateContainerConfigError   0              6m6s
my-galaxy-web-787c9bc7d5-89fpn                                    0/1     Init:0/1                     0              6m6s
my-galaxy-workflow-669dc896c7-wdbjb                               0/1     Init:0/1                     0              6m5s

I checked the status of these abnormal pods and here are some examples: my-galaxy-celery-79c77b8ccb-cvfcr

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  9m48s (x5 over 10m)  default-scheduler  0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod..
  Normal   Scheduled         9m38s                default-scheduler  Successfully assigned default/my-galaxy-celery-79c77b8ccb-cvfcr to sunlabfs1
  Warning  FailedMount       9m22s                kubelet            MountVolume.MountDevice failed for volume "pvc-93455351-07d3-41be-91fd-0b82dc4866e9" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name cvmfs.csi.cern.ch not found in the list of registered CSI drivers
  Warning  FailedMount       46s (x4 over 7m35s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[refdata-gxy], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
nuwang commented 1 year ago

Aah I see. You may have run into an issue with the cvmfs csi driver. Try restarting the cvmfs-csi daemonset, then restart the pods that use the pvcs.

Also, can you also post the results of kubectl get pvc

Milokita commented 1 year ago

I reinit the k8s to get a fresh restart which Calico Pod Network Addon is used. The head node is debian 10, other 3 workers are debian 11, all 4 node uses the (containerd://1.6.21) and k8s: 1.27.2

I init with ~/galaxy-helm/galaxy$ helm install my-galaxy . --debug

the pods status:

NAME                                                              READY   STATUS              RESTARTS      AGE
galaxy-my-galaxy-postgres-0                                       0/1     Pending             0             2m36s
my-galaxy-celery-79c77b8ccb-v6ntf                                 0/1     Init:0/1            0             2m43s
my-galaxy-celery-beat-df6996755-s52q8                             0/1     Init:0/1            0             2m43s
my-galaxy-cvmfscsi-controllerplugin-647ccf5b79-snmzf              2/2     Running             0             2m43s
my-galaxy-cvmfscsi-nodeplugin-kfkk4                               0/2     Pending             0             2m43s
my-galaxy-cvmfscsi-nodeplugin-pcvzd                               0/2     Pending             0             2m43s
my-galaxy-cvmfscsi-nodeplugin-t6b6q                               0/2     Pending             0             2m43s
my-galaxy-init-db-qjk9m-7bl2h                                     0/1     Init:0/1            0             2m43s
my-galaxy-init-mounts-kembb-9wdtj                                 0/4     ContainerCreating   0             2m43s
my-galaxy-job-0-c9dc6c54c-t8wjc                                   0/1     Init:0/1            0             2m42s
my-galaxy-nginx-5f6f88ffcc-4kk54                                  0/1     CrashLoopBackOff    3 (43s ago)   2m43s
my-galaxy-post-install-cvmfs-fix-job-pwtcr                        1/1     Running             0             2m43s
my-galaxy-postgres-6cb57c64fd-m5fpp                               1/1     Running             0             2m43s
my-galaxy-rabbitmq-5879ccf4f4-qmw2d                               1/1     Running             0             2m43s
my-galaxy-rabbitmq-messaging-topology-operator-865fb65bb8-mrmrz   1/1     Running             0             2m43s
my-galaxy-rabbitmq-server-server-0                                0/1     Pending             0             2m42s
my-galaxy-tusd-7ccfc4fc6c-gx997                                   0/1     CrashLoopBackOff    4 (62s ago)   2m43s
my-galaxy-web-787c9bc7d5-jb4rf                                    0/1     Init:0/1            0             2m43s
my-galaxy-workflow-669dc896c7-56898                               0/1     Init:0/1            0             2m43s

kubectl get pv:

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS      REASON   AGE
nfs-volume                                 200Gi      RWO,ROX,RWX    Recycle          Bound    default/my-galaxy-galaxy-pvc                                   15m
pvc-4d897e29-dd5c-4200-b214-21a2b2cd078f   10Gi       ROX            Delete           Bound    default/my-galaxy-refdata-gxy-pvc   my-galaxy-cvmfs            5m2s

kubectl get pvc:

NAME                                             STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
my-galaxy-cvmfs-alien-cache-pvc                  Pending                                                                        nfs               5m48s
my-galaxy-galaxy-pvc                             Bound     nfs-volume                                 200Gi      RWO,ROX,RWX                      5m48s
my-galaxy-refdata-gxy-pvc                        Bound     pvc-4d897e29-dd5c-4200-b214-21a2b2cd078f   10Gi       ROX            my-galaxy-cvmfs   5m48s
persistence-my-galaxy-rabbitmq-server-server-0   Pending                                                                                          5m47s
pgdata-galaxy-my-galaxy-postgres-0               Pending                                                                                          5m41s

kubectl describe pvc pgdata-galaxy-my-galaxy-postgres-0:

Name:          pgdata-galaxy-my-galaxy-postgres-0
Namespace:     default
StorageClass:
Status:        Pending
Volume:
Labels:        application=spilo
               cluster-name=galaxy-my-galaxy-postgres
               team=galaxy
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       galaxy-my-galaxy-postgres-0
Events:
  Type    Reason         Age                   From                         Message
  ----    ------         ----                  ----                         -------
  Normal  FailedBinding  13s (x25 over 6m13s)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

kubectl describe daemonset my-galaxy-cvmfscsi-nodeplugin

Name:           my-galaxy-cvmfscsi-nodeplugin
Selector:       app=cvmfscsi,component=nodeplugin,release=my-galaxy
Node-Selector:  <none>
Labels:         app=cvmfscsi
                app.kubernetes.io/managed-by=Helm
                chart=cvmfscsi-2.0.0
                component=nodeplugin
                heritage=Helm
                release=my-galaxy
Annotations:    deprecated.daemonset.template.generation: 1
                meta.helm.sh/release-name: my-galaxy
                meta.helm.sh/release-namespace: default
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=cvmfscsi
           chart=cvmfscsi-2.0.0
           component=nodeplugin
           heritage=Helm
           release=my-galaxy
  Containers:
   registrar:
    Image:      registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
    Port:       <none>
    Host Port:  <none>
    Args:
      -v=5
      --csi-address=$(CSI_ADDRESS)
      --kubelet-registration-path=$(KUBELET_CSI_REGISTRATION_PATH)
    Environment:
      CSI_ADDRESS:                    /csi/csi.sock
      KUBELET_CSI_REGISTRATION_PATH:  /var/lib/kubelet/plugins/cvmfs.csi.cern.ch/csi.sock
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
   nodeplugin:
    Image:      registry.cern.ch/magnum/cvmfs-csi:v2.0.0
    Port:       <none>
    Host Port:  <none>
    Args:
      -v=5
      --nodeid=$(NODE_ID)
      --endpoint=$(CSI_ENDPOINT)
      --drivername=$(CSI_DRIVERNAME)
      --start-automount-daemon=true
      --role=identity,node
      --has-alien-cache
    Environment:
      NODE_ID:          (v1:spec.nodeName)
      CSI_ENDPOINT:    unix:///var/lib/kubelet/plugins/cvmfs.csi.cern.ch/csi.sock
      CSI_DRIVERNAME:  cvmfs.csi.cern.ch
    Mounts:
      /cvmfs-aliencache from cvmfs-aliencache (rw)
      /cvmfs-localcache from cvmfs-localcache (rw)
      /dev from host-dev (rw)
      /etc/cvmfs/config.d from cvmfs-config-config-d (rw)
      /etc/cvmfs/default.local from cvmfs-config-default-local (rw,path="default.local")
      /lib/modules from lib-modules (ro)
      /sys from host-sys (rw)
      /var/lib/kubelet/plugins/cvmfs.csi.cern.ch from plugin-dir (rw)
      /var/lib/kubelet/pods from pods-mount-dir (rw)
  Volumes:
   plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/cvmfs.csi.cern.ch
    HostPathType:  DirectoryOrCreate
   registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry
    HostPathType:  DirectoryOrCreate
   pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:  Directory
   host-sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
   lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
   host-dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:
   cvmfs-localcache:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   cvmfs-aliencache:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cvmfs-alien-cache
    ReadOnly:   false
   cvmfs-config-default-local:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cvmfs-csi-default-local
    Optional:  false
   cvmfs-config-config-d:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               cvmfs-csi-config-d
    Optional:           false
  Priority Class Name:  system-node-critical
Events:
  Type    Reason            Age   From                  Message
  ----    ------            ----  ----                  -------
  Normal  SuccessfulCreate  8m4s  daemonset-controller  Created pod: my-galaxy-cvmfscsi-nodeplugin-pcvzd
  Normal  SuccessfulCreate  8m4s  daemonset-controller  Created pod: my-galaxy-cvmfscsi-nodeplugin-kfkk4
  Normal  SuccessfulCreate  8m4s  daemonset-controller  Created pod: my-galaxy-cvmfscsi-nodeplugin-t6b6q
nuwang commented 1 year ago

I think this message highlights the error: no persistent volumes available for this claim and no storage class is set

Try setting a default storage class in your cluster:

cloudman@cloudman-cloudlaunchserver-76ff9cd6d9-9f4pl:/app/cloudman$ kubectl get storageclass
NAME              PROVISIONER                                            RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
ebs (default)     ebs.csi.aws.com                                        Retain          WaitForFirstConsumer   true                   36m
gxy-cvmfs-cvmfs   cvmfs.csi.cern.ch                                      Delete          Immediate              false                  35m
nfs               cluster.local/nfs-provisioner-nfs-server-provisioner   Retain          Immediate              true                   36m

In general, the following must have appropriate storage classes defined. https://github.com/galaxyproject/galaxy-helm/blob/b3dbafcc42da4fcfc34d58d6f8d699d3317efef4/galaxy/values.yaml#L172

https://github.com/galaxyproject/galaxy-helm/blob/b3dbafcc42da4fcfc34d58d6f8d699d3317efef4/galaxy/values.yaml#L296

https://github.com/galaxyproject/galaxy-helm/blob/b3dbafcc42da4fcfc34d58d6f8d699d3317efef4/galaxy/values.yaml#L667

The cvmfs-csi alien cache and galaxy shared persistence must have a ReadWriteMany storageclass, so you may want to explicitly set that to nfs, while rabbitmq and postgres should be set to ReadWriteOnce type storage.

Note also that we have not yet tested the chart on k8s 1.27, although it should probably work.

Milokita commented 1 year ago

Thanks again for the support, I created two storage classes connect to the same nfs server but different path

kubectl get storageclass
NAME                        PROVISIONER                                                       RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-csi (default)           nfs.csi.k8s.io                                                    Delete          Immediate           false                  9m3s
nfs-csi-rwo                 nfs.csi.k8s.io                                                    Delete          Immediate           false                  7m16s

As suggested I changed the follow three lines to :

line 172

persistence:
  enabled: true
  name: galaxy-pvc
  annotations: {}
  storageClass: nfs-csi
  existingClaim: null
  accessMode: ReadWriteMany
  size: 5Gi
  mountPath: /galaxy/server/database

line 296

nameOverride: postgres
  persistence:
    enabled: true
    storageClass: nfs-csi-rwo
    #size:
    #extra:
    #  selector:
    #    matchLabels:
    #      label-key: label-value

line 667

rabbitmq:
  enabled: true
  deploy: true
  existingCluster:
  existingSecret: '{{ include "galaxy-rabbitmq.fullname" . }}-default-user'
  protocol: amqp
  port: 5672
  nameOverride: rabbitmq
  extraSpec: {}
  terminationGracePeriodSeconds: 90
  persistence:
    storageClassName: nfs-csi-rwo
    storage:

somehow a pvc is not create correctly, it tries to use non exist storage class:

Name:          my-galaxy-cvmfs-alien-cache-pvc
Namespace:     default
StorageClass:  nfs
Status:        Pending
Volume:
Labels:        app.kubernetes.io/instance=my-galaxy
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=cvmfs
               app.kubernetes.io/version=1.0.1
               helm.sh/chart=cvmfs-2.0.0
Annotations:   meta.helm.sh/release-name: my-galaxy
               meta.helm.sh/release-namespace: default
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type     Reason              Age                  From                         Message
  ----     ------              ----                 ----                         -------
  Warning  ProvisioningFailed  4s (x14 over 3m14s)  persistentvolume-controller  storageclass.storage.k8s.io "nfs" not found

while others runs correctly

Name:          pgdata-galaxy-my-galaxy-postgres-0
Namespace:     default
StorageClass:  nfs-csi-rwo
Status:        Bound
Volume:        pvc-10e3f151-a759-4d46-bb1e-c52525e8dc84
Labels:        application=spilo
               cluster-name=galaxy-my-galaxy-postgres
               team=galaxy
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       galaxy-my-galaxy-postgres-0
Events:
  Type    Reason                 Age    From                                                           Message
  ----    ------                 ----   ----                                                           -------
  Normal  ExternalProvisioning   3m29s  persistentvolume-controller                                    waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator
  Normal  Provisioning           3m29s  nfs.csi.k8s.io_sunlabfs1_3a452554-2f8c-40ea-a886-d172f05f2233  External provisioner is provisioning volume for claim "default/pgdata-galaxy-my-galaxy-postgres-0"
  Normal  ProvisioningSucceeded  3m28s  nfs.csi.k8s.io_sunlabfs1_3a452554-2f8c-40ea-a886-d172f05f2233  Successfully provisioned volume pvc-10e3f151-a759-4d46-bb1e-c52525e8dc84

Name:          pgdata-galaxy-my-galaxy-release-postgres-0
Namespace:     default
StorageClass:  nfs-csi
Status:        Bound
Volume:        pvc-cd853706-3089-4b0a-9867-8ca715a5e7d3
Labels:        application=spilo
               cluster-name=galaxy-my-galaxy-release-postgres
               team=galaxy
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       galaxy-my-galaxy-release-postgres-0
Events:
  Type    Reason                 Age    From                                                           Message
  ----    ------                 ----   ----                                                           -------
  Normal  ExternalProvisioning   3m39s  persistentvolume-controller                                    waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator
  Normal  Provisioning           3m39s  nfs.csi.k8s.io_sunlabfs1_3a452554-2f8c-40ea-a886-d172f05f2233  External provisioner is provisioning volume for claim "default/pgdata-galaxy-my-galaxy-release-postgres-0"
  Normal  ProvisioningSucceeded  3m39s  nfs.csi.k8s.io_sunlabfs1_3a452554-2f8c-40ea-a886-d172f05f2233  Successfully provisioned volume pvc-cd853706-3089-4b0a-9867-8ca715a5e7d3

so I am trying to create a storageclass named "nfs"

nuwang commented 1 year ago

That's a dependency coming from here: https://github.com/CloudVE/galaxy-cvmfs-csi-helm/blob/a75a195396710bfc50dab3fb227330dea3a0c3a2/galaxy-cvmfs-csi/values.yaml#L38

So you'll need to set cvmfs.cvmfscsi.cache.alien.pvc=nfs-csi. Or alternatively, you can create a storage class named nfs.

Milokita commented 1 year ago

I created the storageclass named "nfs" because I tried to set with helm install my-galaxy . --set cvmfs.cvmfscsi.cache.alien.pvc="nfs-csi" but got this error:

Error: INSTALLATION FAILED: template: galaxy/charts/cvmfs/templates/cache-pvcs.yaml:5:22: executing "galaxy/charts/cvmfs/templates/cache-pvcs.yaml" at <.Values.cvmfscsi.cache.alien.pvc.name>: can't evaluate field name in type interface {}

after starting up some pods report:

Warning  FailedMount       58s (x2 over 62s)  kubelet            MountVolume.MountDevice failed for volume "pvc-d95e8266-6ad3-4857-b16e-508839bee2da" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name cvmfs.csi.cern.ch not found in the list of registered CSI drivers

so I went to install the csi driver https://github.com/cvmfs-contrib/cvmfs-csi/blob/master/docs/deploying.md#deployment-with-helm-chart and got desired output

but when I tried to start again with helm install my-galaxy ., new errors pop up:

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ConfigMap "cvmfs-csi-default-local" in namespace "default" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "my-galaxy": current value is "cvmfs"

I tried to start with helm install my-galaxy . --set "meta.helm.sh/release-namespace"="galaxy" --set "meta.helm.sh/release-name"="my-galaxy" but the error persists

nuwang commented 1 year ago

My example was missing the last field: https://github.com/CloudVE/galaxy-cvmfs-csi-helm/blob/a75a195396710bfc50dab3fb227330dea3a0c3a2/galaxy-cvmfs-csi/values.yaml#L38

It should be set as: --set cvmfs.cvmfscsi.cache.alien.pvc.storageClass=nfs-csi

nuwang commented 1 year ago

If you want to install the csi chart separately, you will need to uninstall the Galaxy chart first, as it already bundles the csi chart and you cannot have two instances of the csi chart running at the same time. When reinstalling the Galaxy chart, remember to set the cvmfs deploy flag to false on the Galaxy chart, or it will attempt to deploy the bundled chart again. Finally, Galaxy depends on https://github.com/CloudVE/galaxy-cvmfs-csi-helm, which is a wrapper with Galaxy specific repos around the generic csi chart: https://github.com/cvmfs-contrib/cvmfs-cs, and therefore, make sure to install the former, not the latter.

afgane commented 1 year ago

To add to what Nuwan said, I've found it trickier to install the CVMFS-CSI chart separately than as part of galaxy-helm. I've seen that same issue rendered manifests contain a resource that already exists and untangling those resources seemed more involved than just setting the name in the Galaxy chart and having it install automatically as a dependency.

Milokita commented 1 year ago

Thank you both for your kind help. I tried again with re-init k8s, I first created two storage classes

Name:            nfs
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"nfs"},"mountOptions":["nfsvers=4.1"],"parameters":{"server":"192.168.1.40","share":"/nfs"},"provisioner":"nfs.csi.k8s.io","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           nfs.csi.k8s.io
Parameters:            server=192.168.1.40,share=/nfs
AllowVolumeExpansion:  <unset>
MountOptions:
  nfsvers=4.1
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>

Name:            nfs-csi-rwo
IsDefaultClass:  No
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"nfs-csi-rwo"},"mountOptions":["nfsvers=4.1"],"parameters":{"server":"192.168.1.40","share":"/nfsrwo"},"provisioner":"nfs.csi.k8s.io","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}

Provisioner:           nfs.csi.k8s.io
Parameters:            server=192.168.1.40,share=/nfsrwo
AllowVolumeExpansion:  <unset>
MountOptions:
  nfsvers=4.1
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>

using the command helm install my-galaxy . --debug with the line 172 set to storageClass: nfs and line 296 & 667 set to storageClass: nfs-csi but some pvc still fail to create

Name:          my-galaxy-cvmfs-alien-cache-pvc
Namespace:     default
StorageClass:  nfs
Status:        Pending
Volume:
Labels:        app.kubernetes.io/instance=my-galaxy
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=cvmfs
               app.kubernetes.io/version=1.0.1
               helm.sh/chart=cvmfs-2.0.0
Annotations:   meta.helm.sh/release-name: my-galaxy
               meta.helm.sh/release-namespace: default
               volume.beta.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                Age                From                         Message
  ----    ------                ----               ----                         -------
  Normal  ExternalProvisioning  14s (x4 over 45s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator

Name:          my-galaxy-galaxy-pvc
Namespace:     default
StorageClass:  nfs
Status:        Pending
Volume:
Labels:        app.kubernetes.io/instance=my-galaxy
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=galaxy
               app.kubernetes.io/version=23.0
               helm.sh/chart=galaxy-5.7.2
Annotations:   meta.helm.sh/release-name: my-galaxy
               meta.helm.sh/release-namespace: default
               volume.beta.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       my-galaxy-celery-79c77b8ccb-ppxbj
               my-galaxy-celery-beat-df6996755-pkxbq
               my-galaxy-init-db-3bckh-2zr8l
               my-galaxy-init-mounts-cb2as-xhztx
               my-galaxy-job-0-c9dc6c54c-cdqgc
               my-galaxy-nginx-5f6f88ffcc-b9zsk
               my-galaxy-tusd-7ccfc4fc6c-npsd5
               my-galaxy-web-787c9bc7d5-zbvvj
               my-galaxy-workflow-669dc896c7-wvw8l
Events:
  Type    Reason                Age                From                         Message
  ----    ------                ----               ----                         -------
  Normal  ExternalProvisioning  14s (x4 over 45s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator

Name:          my-galaxy-refdata-gxy-pvc
Namespace:     default
StorageClass:  my-galaxy-cvmfs
Status:        Bound
Volume:        pvc-0753e19f-78fb-47d2-9976-4fef7d2a05eb
Labels:        app.kubernetes.io/instance=my-galaxy
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=galaxy
               app.kubernetes.io/version=23.0
               helm.sh/chart=galaxy-5.7.2
Annotations:   meta.helm.sh/release-name: my-galaxy
               meta.helm.sh/release-namespace: default
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: cvmfs.csi.cern.ch
               volume.kubernetes.io/storage-provisioner: cvmfs.csi.cern.ch
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  ROX
VolumeMode:    Filesystem
Used By:       my-galaxy-celery-79c77b8ccb-ppxbj
               my-galaxy-celery-beat-df6996755-pkxbq
               my-galaxy-init-db-3bckh-2zr8l
               my-galaxy-init-mounts-cb2as-xhztx
               my-galaxy-job-0-c9dc6c54c-cdqgc
               my-galaxy-web-787c9bc7d5-zbvvj
               my-galaxy-workflow-669dc896c7-wvw8l
Events:
  Type    Reason                 Age                From                                                                                                         Message
  ----    ------                 ----               ----                                                                                                         -------
  Normal  ExternalProvisioning   29s (x3 over 45s)  persistentvolume-controller                                                                                  waiting for a volume to be created, either by external provisioner "cvmfs.csi.cern.ch" or manually created by system administrator
  Normal  Provisioning           23s                cvmfs.csi.cern.ch_my-galaxy-cvmfscsi-controllerplugin-647ccf5b79-6glnz_1e1b52f8-73ff-4898-92a6-50063a7ee78d  External provisioner is provisioning volume for claim "default/my-galaxy-refdata-gxy-pvc"
  Normal  ProvisioningSucceeded  23s                cvmfs.csi.cern.ch_my-galaxy-cvmfscsi-controllerplugin-647ccf5b79-6glnz_1e1b52f8-73ff-4898-92a6-50063a7ee78d  Successfully provisioned volume pvc-0753e19f-78fb-47d2-9976-4fef7d2a05eb

Name:          pgdata-galaxy-my-galaxy-postgres-0
Namespace:     default
StorageClass:  nfs-csi-rwo
Status:        Pending
Volume:
Labels:        application=spilo
               cluster-name=galaxy-my-galaxy-postgres
               team=galaxy
Annotations:   volume.beta.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       galaxy-my-galaxy-postgres-0
Events:
  Type    Reason                Age                   From                         Message
  ----    ------                ----                  ----                         -------
  Normal  ExternalProvisioning  14s (x10 over 2m26s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator

here is the storageclass info

Name:                  my-galaxy-cvmfs
IsDefaultClass:        No
Annotations:           meta.helm.sh/release-name=my-galaxy,meta.helm.sh/release-namespace=default
Provisioner:           cvmfs.csi.cern.ch
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

Name:            nfs
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"nfs"},"mountOptions":["nfsvers=4.1"],"parameters":{"server":"192.168.1.40","share":"/nfs"},"provisioner":"nfs.csi.k8s.io","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           nfs.csi.k8s.io
Parameters:            server=192.168.1.40,share=/nfs
AllowVolumeExpansion:  <unset>
MountOptions:
  nfsvers=4.1
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>

Name:            nfs-csi-rwo
IsDefaultClass:  No
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"nfs-csi-rwo"},"mountOptions":["nfsvers=4.1"],"parameters":{"server":"192.168.1.40","share":"/nfsrwo"},"provisioner":"nfs.csi.k8s.io","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}

Provisioner:           nfs.csi.k8s.io
Parameters:            server=192.168.1.40,share=/nfsrwo
AllowVolumeExpansion:  <unset>
MountOptions:
  nfsvers=4.1
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>
nuwang commented 1 year ago

It appears that none of your nfs storage classes are being provisioned. Which suggests that the nfs-provisioner is not correctly configured. This message confirms: waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator

Take a look at the nfs provisioner pod logs, and make sure they are running as expected. Since you are using a storage class, you probably need to configure and test the nfs-subdir-external-provisioner as described in the article you posted earlier.