clincha-org / clincha

Configuration and monitoring of clinch-home infrastructure
https://clinch-home.com
1 stars 1 forks source link

Ceph Filesystem Storage #99

Closed clincha closed 11 months ago

clincha commented 1 year ago

RBD devices are great for block storage used by a single container but Ceph FS is much better at shared storage spaces. Create the shared storage spaces and mount them in pods

clincha commented 1 year ago
kubernetes@bri-master-1 ~]$ k logs -n ceph-csi-cephfs ceph-csi-cephfs-provisioner-64d8f94cfd-zzrn8 
Defaulted container "csi-provisioner" out of: csi-provisioner, csi-snapshotter, csi-resizer, csi-cephfsplugin
I0930 18:09:44.332849       1 feature_gate.go:249] feature gates: &{map[HonorPVReclaimPolicy:true]}
I0930 18:09:44.332946       1 csi-provisioner.go:154] Version: v3.5.0
I0930 18:09:44.332953       1 csi-provisioner.go:177] Building kube configs for running in cluster...
I0930 18:09:45.334205       1 common.go:111] Probing CSI driver for readiness
I0930 18:09:45.338858       1 csi-provisioner.go:302] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I0930 18:09:45.339433       1 leaderelection.go:245] attempting to acquire leader lease ceph-csi-cephfs/cephfs-csi-ceph-com...
I0930 18:09:45.347184       1 leaderelection.go:255] successfully acquired lease ceph-csi-cephfs/cephfs-csi-ceph-com
I0930 18:09:45.447576       1 controller.go:811] Starting provisioner controller cephfs.csi.ceph.com_bri-kubeworker-1_9a61a225-3094-46f5-91f8-ec4a4e36dc58!
I0930 18:09:45.447611       1 volume_store.go:97] Starting save volume queue
I0930 18:09:45.447857       1 clone_controller.go:66] Starting CloningProtection controller
I0930 18:09:45.447968       1 clone_controller.go:82] Started CloningProtection controller
I0930 18:09:45.548637       1 controller.go:860] Started provisioner controller cephfs.csi.ceph.com_bri-kubeworker-1_9a61a225-3094-46f5-91f8-ec4a4e36dc58!
I0930 19:47:01.426561       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:01.427046       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:01.441942       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 0
E0930 19:47:01.441977       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:01.442256       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:01.942076       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:01.942261       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:01.949127       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 1
E0930 19:47:01.949149       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:01.949213       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:02.949548       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:02.949698       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:02.956472       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 2
E0930 19:47:02.956495       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:02.956573       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:04.957569       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:04.957694       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:04.966465       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 3
E0930 19:47:04.966489       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:04.966538       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:08.966838       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:08.966982       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:08.974110       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 4
E0930 19:47:08.974137       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:08.974226       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:16.974913       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:16.975071       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:16.989933       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 5
E0930 19:47:16.989953       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:16.989966       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:32.990497       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:47:32.990621       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:47:32.997006       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 6
E0930 19:47:32.997053       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:47:32.997135       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:48:04.997510       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:48:04.997752       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:48:05.004612       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 7
E0930 19:48:05.004640       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:48:05.004709       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:49:09.005638       1 controller.go:1359] provision "default/sabnzbd-movies-claim" class "cephfs-movies": started
I0930 19:49:09.005784       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/sabnzbd-movies-claim"
W0930 19:49:09.012126       1 controller.go:934] Retrying syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76", failure 8
E0930 19:49:09.012145       1 controller.go:957] error syncing claim "53d5ba19-8edb-4d8f-920e-f7379797ca76": failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
I0930 19:49:09.012153       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"sabnzbd-movies-claim", UID:"53d5ba19-8edb-4d8f-920e-f7379797ca76", APIVersion:"v1", ResourceVersion:"68871", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs-movies": rpc error: code = InvalidArgument desc = failed to get connection: connecting failed: rados: ret=-22, Invalid argument
[kubernetes@bri-master-1 ~]$ 
clincha commented 11 months ago

Decoding the secret username and password shows that they haven't been passed in correctly...

[kubernetes@bri-master-1 ~]$ echo PHBsYWludGV4dCBJRD4= | base64 -d
<plaintext ID>
[kubernetes@bri-master-1 ~]$ echo PENlcGggYXV0aCBrZXkgY29ycmVzcG9uZGluZyB0byBJRCBhYm92ZT4= | base64 -d
<Ceph auth key corresponding to ID above>
[kubernetes@bri-master-1 ~]$ 

Having a look at the values.yaml for the chart shows that these are the default values

clincha commented 11 months ago

Investigating the helm chart values shows that the correct values are being supplied but they don't seem to be taking effect

helm get values -n ceph-csi-cephfs ceph-csi-cephfs
clincha commented 11 months ago

The key is different for the CephFS secret config compared to the RBD secret config

For RBD:

      secret:
        create: true
        name: csi-cephfs-secret
        userID: "{{ ceph_user_id }}"
        userKey: "{{ lookup('env', 'CEPH_KEY') }}"

For CephFS

      secret:
        create: true
        name: csi-cephfs-secret
        adminID: "{{ ceph_user_id }}"
        adminKey: "{{ lookup('env', 'CEPH_KEY') }}"

Notice that user changes to admin

clincha commented 11 months ago

Looks like getting the functionality of a persistent storage medium can be done on either RBD or CephFS. Seeing as I've put this effort into getting the CephFS component working I might as well use that. The documentation goes through both scenarios, the CephFS documentation can be found here.

clincha commented 11 months ago

image

Looks like there might be a privilege issue with the share when its mounted. Checking another application with the same mount helped unearth a better error code. The other application gave this error:

2023-10-29 17:46:40,902::ERROR::[filesystem:404] download_dir directory: /downloads/incomplete error accessing
2023-10-29 17:46:52,376::ERROR::[filesystem:404] download_dir directory: /downloads/incomplete/4cbe2cad-17f0-46a9-b0ac-aed87a4f8bb9 error accessing
2023-10-29 17:46:59,448::ERROR::[filesystem:404] download_dir directory: /downloads/incomplete error accessing
clincha commented 11 months ago

Working! K8s wants a single persistent volume claim for multiple pods. Going to do a full teardown test