Issue : problem with reclaim policy delete

ccaillet1974 commented 1 year ago

HI all,

I've created, as said with my previous issue, a storageclass with reclaimPolicy "Delete" but sometimes pv have not been deleted and are on "Release" status when running kubectl get pv

[CORE-LYO0][totof@lyo0-k8s-admin00:~]$ kubectl get pv | grep -v Bound
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                                                            STORAGECLASS       REASON   AGE
pvc-06244827-4d95-4c14-a5b9-7dba47702563   100Gi      RWO            Delete           Released      cdn-bigdata/elasticsearch-data-sophie-mon-es-es-1                oc6810-hdd-iscsi            2d21h
pvc-0e708b36-3af1-4981-8d77-a9b46aafee41   2000Gi     RWO            Delete           Terminating   default/dbench0-pv-claim                                         sc-bench                    3d1h
pvc-7a1004ce-185e-4602-985c-b855702c1488   100Gi      RWO            Delete           Terminating   cdn-bigdata/elasticsearch-data-sophie-mon-es-es-0                oc6810-hdd-iscsi            2d21h
pvc-d67c3a49-2231-4542-9b87-c0b560400493   500Gi      RWO            Delete           Released      default/dbench-pv-claim                                          sc-bench                    3d17h

Those on "Terminating" status, it's due to an action from me. I tried to run the commande : kubectl delete pv but without effect as you could seen.

I've the following log entries when the CSI try to unstage the volume:

2023-06-12 08:24:03.962158 824169 [INFO]: [requestID:3854352599] ISCSI Start to disconnect volume ==> volume wwn is: 6a8ffba1005d5c410241c40a0000000a
2023-06-12 08:24:03.962221 824169 [INFO]: [requestID:3854352599] WaitGetLock start to get lock
2023-06-12 08:24:03.962376 824169 [INFO]: [requestID:3854352599] WaitGetLock finish to get lock
2023-06-12 08:24:03.962458 824169 [INFO]: [requestID:3854352599] Before acquire, available permits is 4
2023-06-12 08:24:03.962530 824169 [INFO]: [requestID:3854352599] After acquire, available permits is 3
2023-06-12 08:24:03.962598 824169 [INFO]: [requestID:3854352599] It took 386.759µs to acquire disConnect lock for 6a8ffba1005d5c410241c40a0000000a.
2023-06-12 08:24:03.962712 824169 [INFO]: [requestID:3854352599] Gonna run shell cmd "ls -l /dev/disk/by-id/ | grep 6a8ffba1005d5c410241c40a0000000a".
2023-06-12 08:24:03.970912 824169 [INFO]: [requestID:3854352599] Shell cmd "ls -l /dev/disk/by-id/ | grep 6a8ffba1005d5c410241c40a0000000a" result:
lrwxrwxrwx 1 root root 10 Jun  9 12:03 dm-name-36a8ffba1005d5c410241c40a0000000a -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jun  9 12:03 dm-uuid-mpath-36a8ffba1005d5c410241c40a0000000a -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jun  9 14:37 scsi-36a8ffba1005d5c410241c40a0000000a -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jun  9 14:37 wwn-0x6a8ffba1005d5c410241c40a0000000a -> ../../dm-0

2023-06-12 08:24:03.971113 824169 [INFO]: [requestID:3854352599] Gonna run shell cmd "ls -l /dev/mapper/ | grep -w dm-0".
2023-06-12 08:24:03.978442 824169 [INFO]: [requestID:3854352599] Shell cmd "ls -l /dev/mapper/ | grep -w dm-0" result:
lrwxrwxrwx 1 root root       7 Jun  9 12:03 36a8ffba1005d5c410241c40a0000000a -> ../dm-0

2023-06-12 08:24:03.978688 824169 [ERROR]: [requestID:3854352599] Can not get DMDevice by alias: dm-0
2023-06-12 08:24:03.978763 824169 [ERROR]: [requestID:3854352599] Get DMDevice by alias:dm-0 failed. error: Can not get DMDevice by alias: dm-0
2023-06-12 08:24:03.978828 824169 [ERROR]: [requestID:3854352599] check device: dm-0 is a partition device failed. error: Get DMDevice by alias:dm-0 failed. error: Can not get DMDevice by alias: dm-0
2023-06-12 08:24:03.978894 824169 [ERROR]: [requestID:3854352599] Get device of WWN 6a8ffba1005d5c410241c40a0000000a error: check device: dm-0 is a partition device failed. error: Get DMDevice by alias:dm-0 failed. error: Can not get DMDevice by alias: dm-0
2023-06-12 08:24:03.978985 824169 [INFO]: [requestID:3854352599] Before release, available permits is 3
2023-06-12 08:24:03.979044 824169 [INFO]: [requestID:3854352599] After release, available permits is 4
2023-06-12 08:24:03.979098 824169 [INFO]: [requestID:3854352599] DeleteLockFile start to get lock
2023-06-12 08:24:03.979152 824169 [INFO]: [requestID:3854352599] DeleteLockFile finish to get lock
2023-06-12 08:24:03.979281 824169 [INFO]: [requestID:3854352599] It took 295.885µs to release disConnect lock for 6a8ffba1005d5c410241c40a0000000a.
2023-06-12 08:24:03.979349 824169 [ERROR]: [requestID:3854352599] disconnect volume failed while unstage volume, wwn: 6a8ffba1005d5c410241c40a0000000a, error: check device: dm-0 is a partition device failed. error: Get DMDevice by alias:dm-0 failed. error: Can not get DMDevice by alias: dm-0
2023-06-12 08:24:03.979424 824169 [ERROR]: [requestID:3854352599] UnStage volume pvc-06244827-4d95-4c14-a5b9-7dba47702563 error: check device: dm-0 is a partition device failed. error: Get DMDevice by alias:dm-0 failed. error: Can not get DMDevice by alias: dm-0

Thanks by advance for your replies.

Regards, Christophe

tangqichuan commented 1 year ago

Can I see your configuration file? The path is as follows: [root@node ~]# vi /etc/multipath.conf

ccaillet1974 commented 1 year ago

No configuration defined. Using default multipathd configuration. Give the multiptah command to run if you want more infos from a worker node.

my woreker are installed as followed

Debian 11
Kernel 5.10.23
k8s : 1.25.6
CSI version 4.0.0

I join also the configuration mutlipath obtained with the command : multipathd list config

Regards multipathd-config.txt

tangqichuan commented 1 year ago

For details, see section 3.6 in the user guide： 3.6 Checking the Host Multipathing Configuration Modify the configuration, restart the UltraPath software, and create a pod again.

ccaillet1974 commented 1 year ago

Firstly I DON'T USE the Ultrapath Softtware using the debian package of multipathd : multipath-tools 0.8.5-2+deb11u1 amd64

It's correctly defined on 6810 (loadbalanced mode) and worker nodes used the native multipathd software according to the result of the commande multipath -ll (results are as followed)

[CORE-LYO0][totof@lyo0-k8s-ppw01:~]$ sudo multipath -ll
[sudo] password for totof:
Jun 12 11:52:59 | sdc: prio = const (setting: emergency fallback - alua failed)
Jun 12 11:52:59 | sdd: prio = const (setting: emergency fallback - alua failed)
Jun 12 11:52:59 | sde: prio = const (setting: emergency fallback - alua failed)
Jun 12 11:52:59 | sdf: prio = const (setting: emergency fallback - alua failed)
36a8ffba1005d5c410241c40a0000000a dm-0 HUAWEI,XSG1
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:1  sdc 8:32  active ready running
  |- 10:0:0:1 sdd 8:48  active ready running
  |- 9:0:0:1  sde 8:64  active ready running
  `- 8:0:0:1  sdf 8:80  active ready running
Jun 12 11:52:59 | sdg: prio = const (setting: emergency fallback - alua failed)
Jun 12 11:52:59 | sdi: prio = const (setting: emergency fallback - alua failed)
Jun 12 11:52:59 | sdj: prio = const (setting: emergency fallback - alua failed)
Jun 12 11:52:59 | sdh: prio = const (setting: emergency fallback - alua failed)
36a8ffba1005d5c41024401da0000000e dm-1 HUAWEI,XSG1
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 8:0:0:2  sdg 8:96  active ready running
  |- 10:0:0:2 sdi 8:128 active ready running
  |- 9:0:0:2  sdj 8:144 active ready running
  `- 7:0:0:2  sdh 8:112 active ready running

tangqichuan commented 1 year ago

If you use the native multipathing software provided by the OS, check whether the /etc/multipath.conf file contains the following configuration item. defaults { user_friendly_names yes find_multipaths no } If the configuration item does not exist, add it to the beginning of the /etc/ multipath.conf file.

You can try this operation because the root cause is that the dm name does not comply with the code verification.

ccaillet1974 commented 1 year ago

So I copied the file multipathd-config.txt joined earlier to /etc/multipath with the name /etc/multipath/multipath.conf and restart the multipathd daemon with the directives added on it.

user_friendly_names yes

and change find_multipaths from "strict" value to "no" value.

And now the volume have been destroyed on my worker nodes.

After that i've deleted the pv entry on k8s with kubectl delete pv pvc-0e708b36-3af1-4981-8d77-a9b46aafee41

Now I need to : 1- propagate the configuraiton file on all worker nodes 2- check the LUN for all remaining volume to be destroyed correctly 3- clean all remaining volume on k8s that need to be destroyed 4- Test that the problem won't occured any more

I'll keep you inform

Regards

tangqichuan commented 1 year ago

Okay, let me know if the problem is solved.

ccaillet1974 commented 1 year ago

It seems to be working now for deleteion ... all pv listed on my first post have disappeared from kubernetes now i4ve only the following pv list on my cluster :

[CORE-LYO0][totof@lyo0-k8s-admin00:~]$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                            STORAGECLASS       REASON   AGE
pvc-161512fb-f566-4844-bba0-1f8a0f45f9e5   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-cold-0         lyo0-oc9k-nfs               142d
pvc-1dc3811c-120e-4d38-a720-b5d5022c3cbb   4Gi        RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-transforms-1        lyo0-oc9k-nfs               95d
pvc-20cce747-b4e9-4332-8e1c-883d5dbbe518   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-ingest-data-hot-2   oc5k-fs                     130d
pvc-27815c93-6ac8-4959-9834-944fd1b35bb3   100Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-mon-es-data-1              oc6810-hdd-iscsi            3d
pvc-2a08bb21-d443-4b16-8858-2be181ed7963   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-ingest-data-hot-5   oc5k-fs                     130d
pvc-2f4426ae-ccd8-461b-a433-0b5967fd6cec   300Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-warm-0         oc5k-fs                     142d
pvc-3867ad01-c9e4-46ef-9a58-d128fb33f998   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-ingest-data-hot-1   oc5k-fs                     130d
pvc-45926fe3-79de-48c4-ab42-7d3622b4b2b1   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-ingest-data-hot-0   oc5k-fs                     130d
pvc-49043782-1d7a-4de1-b75f-ffcdb1351607   8Gi        RWO            Delete           Bound    kube-dbs/data-lyo0-sh-psql-nfs-postgresql-primary-0              lyo0-oc5k-nfs               446d
pvc-4b307039-6fe7-47ae-ac10-2b0ac722e125   300Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-warm-1         oc5k-fs                     142d
pvc-5820d5a6-3987-4e28-b5fa-838f9ae78555   300Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-warm-3         oc5k-fs                     91d
pvc-5a2644ab-285f-4608-b2c8-6307b88922d2   8Gi        RWO            Delete           Bound    kube-dbs/data-lyo0-sh-psql-nfs-postgresql-read-0                 lyo0-oc5k-nfs               446d
pvc-5d3e2536-df0b-4570-8588-0ff47f970f42   4Gi        RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-master-1            lyo0-oc9k-nfs               142d
pvc-649c2d62-83f8-488c-b341-4278143229e6   8Gi        RWO            Delete           Bound    kube-dbs/redis-data-lyo0-sh-redis-nfs-node-1                     lyo0-oc5k-nfs               446d
pvc-6c3d82b0-b994-4212-8833-cf4f5c88ee8b   100Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-mon-es-data-0              oc6810-hdd-iscsi            3d
pvc-6d9b0f9c-afce-4377-a7e7-c3882e6377aa   10Gi       RWO            Delete           Bound    kube-dbs/data-lyo0-redis-shared-redis-ha-server-0                lyo0-oc5k-nfs               445d
pvc-70952380-f9fe-42e2-b272-65ed270e8005   100Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-mon-es-data-2              oc6810-hdd-iscsi            3d
pvc-7741ede6-bf31-420b-83b3-d37f3fe0f3e5   4Gi        RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-master-2            lyo0-oc9k-nfs               142d
pvc-84aa2be8-8ee1-4982-86c2-322eeb6398d4   10Gi       RWO            Delete           Bound    kube-monitoring/lyo0-prom-int-grafana                            lyo0-oc5k-nfs               102d
pvc-87b87724-1fe4-4585-8c69-d8bb4d763847   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-ingest-data-hot-4   oc5k-fs                     130d
pvc-87fa68e4-184a-48a2-b433-0699e0ad12b8   10Gi       RWO            Delete           Bound    kube-dbs/data-lyo0-redis-shared-redis-ha-server-2                lyo0-oc5k-nfs               445d
pvc-8f098f5f-3230-4c36-b74e-3e52ae0a8716   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-cold-3         lyo0-oc9k-nfs               142d
pvc-91a98c16-33dc-48e4-b331-48a00c7a4c1d   8Gi        RWO            Delete           Bound    kube-dbs/redis-data-lyo0-sh-redis-nfs-node-2                     lyo0-oc5k-nfs               446d
pvc-b8fac19d-be9d-4eca-873e-25b5eb2a39f3   8Gi        RWO            Delete           Bound    kube-dbs/redis-data-lyo0-sh-redis-nfs-node-0                     lyo0-oc5k-nfs               446d
pvc-d6f65be7-0697-4aef-8b42-89a6256e1ee7   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-ingest-data-hot-3   oc5k-fs                     130d
pvc-d9814d86-c963-4fd9-87bc-0ef8efe0caf0   10Gi       RWO            Delete           Bound    kube-dbs/data-lyo0-redis-shared-redis-ha-server-1                lyo0-oc5k-nfs               445d
pvc-d9f1499d-566a-4c29-8e25-ac42bb58d38e   4Gi        RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-master-0            lyo0-oc9k-nfs               142d
pvc-dd42e771-a5f0-4708-8a71-d94297908d07   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-cold-2         lyo0-oc9k-nfs               142d
pvc-df52d229-069e-44c3-89a1-ebd9e01c6024   200Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-cold-1         lyo0-oc9k-nfs               142d
pvc-dfd863d7-8c83-4c25-8b38-83344053cc3a   4Gi        RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-transforms-0        lyo0-oc9k-nfs               95d
pvc-e42c5c75-9f93-4ae1-84ee-be5794453f71   300Gi      RWO            Delete           Bound    cdn-bigdata/elasticsearch-data-sophie-int-es-data-warm-2         oc5k-fs                     142d
pvc-f364a229-5a8f-4d3c-af09-1f287ac800c7   1Gi        RWX            Delete           Bound    cdn-tools/lyo0-netbox-media                                      lyo0-oc5k-nfs               445d

Checking if deletion is working well ... need to check if the problem described on issue #133 is also solved with this configuration. If is .. maybe you should displayed the multipath configuration when OS multipathd is used with the parameters needed by Huawei CSI ?

I'll keep you in touch

EDIT 1 : is find_multipaths "no" mandatory or the default debian value ("strict") will work ?

Regards

tangqichuan commented 1 year ago

find_multipaths "no" mandatory ，For details, see section 3.6 in the csi user document. https://github.com/Huawei/eSDK_K8S_Plugin/blob/V4.0/docs/eSDK%20Huawei%20Storage%20Kubernetes%20CSI%20Plugins%20V4.0.0%20User%20Guide%2001.pdf

ccaillet1974 commented 1 year ago

Thanks for your reply

Regards.

tangqichuan commented 1 year ago

you are welcome

Huawei / eSDK_K8S_Plugin

Issue : problem with reclaim policy delete #134