SwissDataScienceCenter / renku-data-services

Services that handle reading and writing data from a database
Apache License 2.0
3 stars 2 forks source link

Resuming a v2 session with public cloud storage does not work #424

Closed olevski closed 3 weeks ago

olevski commented 1 month ago

It seems that when the session resumes we require cloud storage credentials.

We should also check the case when the session is started with cloud storage with credentials and the session is resumed.

See https://swiss-data-science.slack.com/archives/C67N59QL8/p1727166607703479

│   Warning  FailedMount             79s (x29 over 44m)  kubelet                   
MountVolume.SetUp failed for volume "pvc-b4ec8f21-ebdc-46b3-834f-0a704809d0f5" : rpc error: code = Unk ││ nown 
desc = Cannot find the 'secretName' and/or 'secretNamespace' fields in the volume context. If you are not using automated provisioning you have to specify these values manually i ││ n spec.csi.volumeAttributes in your PersistentVolume manifest. If you are using automated provisioning and these values are not found report this as a bug to the developers.           │
olevski commented 1 month ago

One part of this problem is that if there is an old session that has been started prior to the changes we made in https://github.com/SwissDataScienceCenter/csi-rclone/pull/20

Then the PV looks like this:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: csi-rclone
    volume.kubernetes.io/provisioner-deletion-secret-name: ""
    volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
  creationTimestamp: "2024-08-23T07:44:59Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-provisioner.volume.kubernetes.io/finalizer
  name: pvc-1bd81ccc-54a1-48fd-ac9f-4bc650f16538
  resourceVersion: "1310777056"
  uid: ca194346-da61-4187-ab95-745040a30330
spec:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: xxxx
    namespace: renku
    resourceVersion: "1269553012"
    uid: 1bd81ccc-54a1-48fd-ac9f-4bc650f16538
  csi:
    driver: csi-rclone
    volumeAttributes:
      namespace: renku
      remote: some-remote
      remotePath: some-path
      secretName: some-secret
      storage.kubernetes.io/csiProvisionerIdentity: 1719987631740-8081-csi-rclone
    volumeHandle: pvc-1bd81ccc-54a1-48fd-ac9f-4bc650f16538
  persistentVolumeReclaimPolicy: Delete
  storageClassName: csi-rclone
  volumeMode: Filesystem
status:
  phase: Bound

But the new code looks for the field secretNamespace in the volume attributes and if it is not there it fails.

The failure message in the k8s pod events is the something like this:

MountVolume.SetUp failed for volume "pvc-1bd81ccc-54a1-48fd-ac9f-4bc650f16538" : rpc error: code = Unknown desc = Cannot find the 'secretName' and/or 'secretNamespace' fields in the volume context. If you are not using automated provisioning you have to specify these values manually in spec.csi.volumeAttributes in your PersistentVolume manifest. If you are using automated provisioning and these values are not found report this as a bug to the developers.

EDIT: This problem should be addressed in https://github.com/SwissDataScienceCenter/csi-rclone/pull/32. The problem originally reported in this issue is similar but occurs when a session is launched without a secret. This is really strange and should be examined further.

olevski commented 1 month ago

I was under the impression that when we use a public bucket then there is no secret used in the mounting of the csi rclone volume at all. But this is not the case. Even when the bucket is public there is a secret associated with it which holds the whole rclone config. So the error that showed up recently for a private bucket is the same as the one originally reported. And with the release of version 0.3.3 of the csi rclone plugin this should not occur anymore at all.

See https://github.com/SwissDataScienceCenter/csi-rclone/pull/32