IBM / ubiquity

Ubiquity
Apache License 2.0
90 stars 26 forks source link

Improve performance for mpath Discovery method #171

Closed shay-berman closed 6 years ago

shay-berman commented 6 years ago

Improve performance by skip discover devices by sg_inq during the attach flow (create POD).

Description: In case you have many devices on the worker nodes and you create a new POD with PVC, before this fix the discover would go on all the PVCs devices on the node and trigger sg_inq on each one of them in order to search for the device for the new PVC (wwn). Well in most use cases there is no need to do this overhead during attach since the device probably doesn't exist on the host or if it is then it should appear in multipath -ll already. So to improve performance we decided to skip this deepDiscovery(by sg_inq on all the devices). And it will already be covered anyway after the rescanAll.

Improve discovery performance by applying the following things:

  1. Do NOT do sg_inq deepDiscovery during the RescanAll in the attach flow. Its waist of time to do deepDiscovery in this case, since its already done after the RescanAll when discovering the new device.

  2. Do sg_inq deepDiscovery only in the following cases: 2.1. During the attach after the RescanAll 2.2. When Discover for cleanup(unmount flow).

  3. Remove double sg_inq validation if discover during deepDiscovery

  4. Remove Stat(mpath) validation during Discover, since its already validated during GetWwnByScsiInq() or DiscoverBySgInq()

  5. Fix interfaces, UT and fakes according (Still need to add more UT)

In addition this PR improve the DiscoverBySgInq by searching only IBM devices(devices with "IBM" vendor tag presented in multipath -ll). A side affect of this filtering is that it will not work on faulty devices that usually doesn't present any Vendor tag on.

How to test

  1. create POD1 with PVC1 and check in the flex logs that the volume didn't had to discover using DiscoverBySgInq, means you should see the message "mpath device was NOT found for WWN [%s] in multipath -ll"(where %s is the wwn of the PVC). but you should not see any try to run sg_inq on this WWN (debug for GetWwnByScsiInq with this mpath device)

  2. Try to get in multipath -ll with bad devices, then run create POD2 with PVC2, this this case there is high chance that multipath -ll will not present the right WWN for the new PVC2 device. if so then you should see in the log that flex tried to identify this PVC2 device by using sg_inq. You can see in the log message like "Warning: device [%s] found for WWN [%s] after running sg_inq"


This change is Reviewable

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.04%) to 54.782% when pulling 399ae7333de104c9269a6d33c93e67c425dd5d19 on fix/no_deep_discover_during_RescanAll_in_attach_flow into c6dd8f80ff9402b3e8c82e65554aeedd0d6cea4d on dev.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.02%) to 54.801% when pulling 12f961645fa2030a35881a6ddae6cf203dd8d790 on fix/no_deep_discover_during_RescanAll_in_attach_flow into c6dd8f80ff9402b3e8c82e65554aeedd0d6cea4d on dev.

ranhrl commented 6 years ago

:lgtm: besides the license headers?


Review status: 0 of 8 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


fakes/fake_block_device_mounter_utils.go, line 1 at r2 (raw file):

// Code generated by counterfeiter. DO NOT EDIT.

Did you remove the license header? Why does it look that way?


fakes/fake_block_device_utils.go, line 1 at r2 (raw file):

// Code generated by counterfeiter. DO NOT EDIT.

same here, what happened to the license header?


Comments from Reviewable

shay-berman commented 6 years ago

Review status: 0 of 8 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


fakes/fake_block_device_mounter_utils.go, line 1 at r2 (raw file):

Previously, ranhrl (Ran Harel) wrote…
Did you remove the license header? Why does it look that way?

all files with "fake" names are generated automatically by go Counterfeiters. its a moking system. its not our code its self generated. no sense to put there license because it will always be overwritten by couterfeiter.

agree?


fakes/fake_block_device_utils.go, line 1 at r2 (raw file):

Previously, ranhrl (Ran Harel) wrote…
same here, what happened to the license header?

same as above


Comments from Reviewable

shay-berman commented 6 years ago

Testing update about this PR:

  1. Automation testing passed OK
  2. Sanity tests passed OK
  3. Acceptance tests with longevity on RHEL and Ubuntu passed OK

So, we decided to merge this PR into dev branch.