galaxyproject / galaxy-helm

Minimal setup required to run Galaxy under Kubernetes
MIT License
38 stars 36 forks source link

galaxy-cvmfscsi-nodeplugin daemonset uses wrong pvc name #434

Closed pckroon closed 1 year ago

pckroon commented 1 year ago

Hi all,

I'm trying to set up galaxy with the cvmfs refdata, and I think I'm almost there. The issue I'm running into is that the galaxy-cvmfs-nodeplugin pods get stuck in pending because they cannot find a pvc by the name cvmfs-alien-cache:

  Warning  FailedScheduling  102s (x8 over 2m29s)  default-scheduler  0/3 nodes are available: 3 persistentvolumeclaim "cvmfs-alien-cache" not found. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

And it's correct in stating that, since it's named galaxy-cvmfs-alien-cache-pvc:

> kubectl get pvc -n galaxy
NAME                                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
galaxy-cvmfs-alien-cache-pvc                  Bound    pvc-ea23272b-8c33-4d38-a3ce-02d15d36746e   10000Mi    RWX            openebs-rwx           4m33s
galaxy-data                                   Bound    galaxy-data                                500Gi      RWX            nfs                   76d
galaxy-data-nfs                               Bound    pvc-a7c6191c-a165-453e-a13c-b92b3a2efa4f   500Gi      RWO            openebs-jiva-csi-sc   76d
galaxy-galaxy-pvc                             Bound    pvc-f66fe77d-9512-433f-9f8a-02dd51ce82a5   500Gi      RWX            openebs-rwx           4m33s
galaxy-refdata-gxy-pvc                        Bound    pvc-8b460664-fb27-4714-a7cb-9609fa533c9c   10Gi       ROX            galaxy-cvmfs          4m33s
persistence-galaxy-rabbitmq-server-server-0   Bound    pvc-38e03623-86ad-4062-8d46-65b1f9f3d909   10Gi       RWO            openebs-jiva-csi-sc   4m8s

Here's the relevant bit of Helm values:

refdata:
  enabled: true
  type: cvmfs

cvmfs:
  deploy: true
  cvmfscsi:
    cache:
      alien:
        enabled: true
        pvc:
          storageClass: openebs-rwx

Setting cvmfs.cvmfscsi.cache.alien.pvc.name to galaxy-cvmfs-alien-cache-pvc doesn't seem to do anything. My current workaround is to edit the daemonset post-install, so I'm looking for a better solution :)

pckroon commented 1 year ago

Hmmmn, galaxy jobs run into the following:

  Warning  Failed     14s (x3 over 16s)  kubelet            Error: failed to create subPath directory for volumeMount "galaxy-refdata-gxy-pvc" of container "k8s"

So it seems my workaround doesn't quite work. The PVC it's trying to mount /does/ exist:

galaxy-refdata-gxy-pvc                        Bound    pvc-927b71d2-668e-4e64-90ee-ba114f534b73   10Gi       ROX            galaxy-cvmfs          75m
ksuderman commented 1 year ago

The subPath problem was (is) due to problems in the upstream chart(s). I thought we had fixed that here, but apparently not... It will be with #436 As a short term workaround you can restart the CVMFS nodeplugin and kill stuck jobs.

Setting cvmfs.cvmfscsi.cache.alien.pvc.name to cvmfs-alien-cache should fix the problem with the nodeplugin not finding the PVC. We have discussed fixing this in the past and I've opened #437 to track the issue.

pckroon commented 1 year ago

Hi both! Many thanks for the rapid response. Changing my values as specified by @nuwang fixed it all, and I'm super happy with the working reference data! :partying_face: