IBM / power-openstack-k8s-volume-driver

power-openstack-k8s-volume-driver
Apache License 2.0
2 stars 13 forks source link

Driver incorrectly trying to format a LUN which already has XFS file system after move to different worker node #1

Closed dannert closed 5 years ago

dannert commented 5 years ago

I did a little bit more testing and I'm running into issues when k8s tries to re-deploy the mongodb with the FlexVolume LUN to another worker node because the first node failed (shut down).

My assumption would be that k8s keeps track of this helm chart deploy and tries to ensure that one POD is running at all times on one of the worker nodes. As this is a database and I'm using persistent volume to host the data of the database, I'd expect the LUN and the POD to move to another worker and then the container "just starts up" with the LUN mounted to "/data/db" in the container, without modifying any data on that LUN in the process. The expected end result is the mongodb being up-and-running with the same data as it was before the re-deploy to new worker node.

This is what I see in reality and where it fails in your driver on the destination worker node: Environment: worker2 was original deploy location, worker1 is the new deploy location master: 129.40.93.126 worker1: 129.40.93.107 worker2: 129.40.93.124 credentials: - available on request by email to dannert@us.ibm.com

POD: pod mongodb-t1-ibm-mongodb-dev*; current failing deploy: mongodb-t1-ibm-mongodb-dev-76c998b87f-7qqvb

Storage class used for deploy: kbs describe storageclass ssd-retain

Name: ssd-retain IsDefaultClass: No Annotations: Provisioner: ibm/powervc-k8s-volume-provisioner Parameters: fsType=xfs,type=icp-v7k3-ssd AllowVolumeExpansion: MountOptions: ReclaimPolicy: Retain VolumeBindingMode: Immediate Events:

Scenario:

1) LUN is unmapped from worker2 (ok), BUT device is NOT correctly removed and hyperkube complains, as well as the Linux path checker. Extract from /var/log/messages on worker2:

On worker 2, which previosly owned the POD, /var/log/messages shows

Jan 30 16:37:05 aop93cl124 hyperkube: E0130 16:37:05.071283 4536 kubelet_volumes.go:140] Orphaned pod "a0a236ad-23df-11e9-b738-fa99c511ef20" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Jan 30 16:37:05 aop93cl124 multipathd: mpathh: sdc - tur checker reports path is down Jan 30 16:37:06 aop93cl124 multipathd: mpathh: sdx - tur checker reports path is down Jan 30 16:37:07 aop93cl124 hyperkube: E0130 16:37:07.062533 4536 kubelet_volumes.go:140] Orphaned pod "a0a236ad-23df-11e9-b738-fa99c511ef20" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Jan 30 16:37:07 aop93cl124 multipathd: mpathh: sdo - tur checker reports path is down Jan 30 16:37:09 aop93cl124 hyperkube: E0130 16:37:09.068412 4536 kubelet_volumes.go:140] Orphaned pod "a0a236ad-23df-11e9-b738-fa99c511ef20" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Jan 30 16:37:09 aop93cl124 multipathd: mpathh: sdf - tur checker reports path is down Jan 30 16:37:09 aop93cl124 multipathd: mpathh: sdi - tur checker reports path is down Jan 30 16:37:09 aop93cl124 multipathd: mpathh: sdl - tur checker reports path is down Jan 30 16:37:09 aop93cl124 multipathd: mpathh: sdr - tur checker reports path is down Jan 30 16:37:09 aop93cl124 multipathd: mpathh: sdu - tur checker reports path is down Jan 30 16:37:10 aop93cl124 multipathd: mpathh: sdc - tur checker reports path is down 2) LUN is mapped to worker1 (ok) - verified in PvC and on worker1 3) LUN shows as "attached" in worker1 (ok) 4) ... but then your driver fails as it seems to be trying to create a new XFS file system on the already XFS formatted LUN. I had excepted that your driver does not try format a LUN if it already contains a valid file system but just mounts it for use by the container?

Here the logs from worker1:

On worker 1, which has the LUN attached under multipath device "mpathi"

Jan 30 16:42:02 aop93cl107 hyperkube: W0130 16:42:02.510705 4280 plugin-defaults.go:32] flexVolume driver ibm/powervc-k8s-volume-flex: using default GetVolumeName for volume pvc-a09ed93e-23df-11e9-b738-fa99c511ef20 Jan 30 16:42:02 aop93cl107 hyperkube: W0130 16:42:02.510765 4280 plugin.go:134] flexVolume driver ibm/powervc-k8s-volume-flex: GetVolumeName is not supported yet. Defaulting to PV or volume name: pvc-a09ed93e-23df-11e9-b738-fa99c511ef20 Jan 30 16:42:05 aop93cl107 hyperkube: I0130 16:42:05.755221 4280 operation_generator.go:498] MountVolume.WaitForAttach succeeded for volume "pvc-a09ed93e-23df-11e9-b738-fa99c511ef20" (UniqueName: "flexvolume-ibm/powervc-k8s-volume-flex/pvc-a09ed93e-23df-11e9-b738-fa99c511ef20") pod "mongodb-t1-ibm-mongodb-dev-76c998b87f-7qqvb" (UID: "e4856847-24cf-11e9-b738-fa99c511ef20") DevicePath "/dev/dm-8" Jan 30 16:42:05 aop93cl107 hyperkube: W0130 16:42:05.759833 4280 plugin-defaults.go:32] flexVolume driver ibm/powervc-k8s-volume-flex: using default GetVolumeName for volume pvc-a09ed93e-23df-11e9-b738-fa99c511ef20 Jan 30 16:42:05 aop93cl107 hyperkube: W0130 16:42:05.759875 4280 plugin.go:134] flexVolume driver ibm/powervc-k8s-volume-flex: GetVolumeName is not supported yet. Defaulting to PV or volume name: pvc-a09ed93e-23df-11e9-b738-fa99c511ef20 Jan 30 16:42:05 aop93cl107 hyperkube: E0130 16:42:05.783006 4280 driver-call.go:258] mountdevice command failed, status: Failed, reason: Could not create file system on attached volume directory /dev/dm-8. Error is exit status 1 Jan 30 16:42:05 aop93cl107 hyperkube: E0130 16:42:05.783302 4280 nestedpendingoperations.go:267] Operation for "\"flexvolume-ibm/powervc-k8s-volume-flex/pvc-a09ed93e-23df-11e9-b738-fa99c511ef20\"" failed. No retries permitted until 2019-01-30 16:44:07.783222622 -0500 EST m=+3534.661304581 (durationBeforeRetry 2m2s). Error: "MountVolume.MountDevice failed for volume \"pvc-a09ed93e-23df-11e9-b738-fa99c511ef20\" (UniqueName: \"flexvolume-ibm/powervc-k8s-volume-flex/pvc-a09ed93e-23df-11e9-b738-fa99c511ef20\") pod \"mongodb-t1-ibm-mongodb-dev-76c998b87f-7qqvb\" (UID: \"e4856847-24cf-11e9-b738-fa99c511ef20\") : mountdevice command failed, status: Failed, reason: Could not create file system on attached volume directory /dev/dm-8. Error is exit status 1"

gautpras commented 5 years ago

The issue has been fixed at our internal github. This will get closed formally once the code is synchronised from internal github, which will be once the release of new version of driver is done.

gautpras commented 5 years ago

Fixed with latest merge