There is a potential race between starting a POD and deleting a POD that may lead to faulty device on the operating system (left over device, which doesn't impact the host, but we still want to prevent it).
When deleting a pod the flex removes the multipath device and then unmap the volume from the host. But between these 2 operations there could be a race - if a rescan comes (for example a new POD created at the same time and as a result a multipath reload triggered) it could return back the deleted multipath devices and lead to faulty multipath device(because its a device that going to be unmapped, so it will stay on the OS as faulty device).
How to fix:
Preventing concurrent rescans during delete POD, in time-frame were the mpath device path removed but the volume still mapped to the host. (blockDeviceMounterUtils.UnmountDeviceFlow and just before ActionAfterDetach start)
How to test
Run concurrent create pod and delete pod at the same time. Do it many times in longevity. (of cause in XAVI automatic testing)
Coverage decreased (-0.06%) to 54.742% when pulling 93dee08564ab4a69a38660623d9ad91825020568 on fix/faulty_multipath_devices_option2_rescanlock_basic into 62a7ae4c7c116f712727aef0f74c57e605ba5673 on dev.
There is a potential race between starting a POD and deleting a POD that may lead to faulty device on the operating system (left over device, which doesn't impact the host, but we still want to prevent it).
When deleting a pod the flex removes the multipath device and then unmap the volume from the host. But between these 2 operations there could be a race - if a rescan comes (for example a new POD created at the same time and as a result a multipath reload triggered) it could return back the deleted multipath devices and lead to faulty multipath device(because its a device that going to be unmapped, so it will stay on the OS as faulty device).
How to fix: Preventing concurrent rescans during delete POD, in time-frame were the mpath device path removed but the volume still mapped to the host. (blockDeviceMounterUtils.UnmountDeviceFlow and just before ActionAfterDetach start)
How to test Run concurrent create pod and delete pod at the same time. Do it many times in longevity. (of cause in XAVI automatic testing)
This change is