IBM / ubiquity

Ubiquity
Apache License 2.0
90 stars 26 forks source link

Fix for left over faulty multipath devices by preventing concurrent rescans during unmount and mount ops. #190

Closed shay-berman closed 6 years ago

shay-berman commented 6 years ago

There is a potential race between starting a POD and deleting a POD that may lead to faulty device on the operating system (left over device, which doesn't impact the host, but we still want to prevent it).

When deleting a pod the flex removes the multipath device and then unmap the volume from the host. But between these 2 operations there could be a race - if a rescan comes (for example a new POD created at the same time and as a result a multipath reload triggered) it could return back the deleted multipath devices and lead to faulty multipath device(because its a device that going to be unmapped, so it will stay on the OS as faulty device).

How to fix: Preventing concurrent rescans during delete POD, in time-frame were the mpath device path removed but the volume still mapped to the host. (blockDeviceMounterUtils.UnmountDeviceFlow and just before ActionAfterDetach start)

How to test Run concurrent create pod and delete pod at the same time. Do it many times in longevity. (of cause in XAVI automatic testing)


This change is Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.06%) to 54.742% when pulling 93dee08564ab4a69a38660623d9ad91825020568 on fix/faulty_multipath_devices_option2_rescanlock_basic into 62a7ae4c7c116f712727aef0f74c57e605ba5673 on dev.

shay-berman commented 6 years ago

I am closing this PR - its low priority and also some of the aspect were already resolved in other PRs. will keep the branch for now.