Azure / LinuxPatchExtension

Microsoft Azure VM Guest Linux Patch Extension
Apache License 2.0
10 stars 8 forks source link

Calculate maximum batch size in batch patching as per time remaining in maintenance window #250

Closed GAURAVRAMRAKHYANI closed 5 months ago

GAURAVRAMRAKHYANI commented 7 months ago

There are four changes at high level: (a) Calculate maximum batch size in batch patching as per time remaining in maintenance window. The upper limit of max batch size is 300.

(b) If there are some remaining packages after batch patching due to failures in batch patching or maintenance window cutoff exceeded, then do batch patching again with lower batch size (1/10th of original batch size) instead of directly falling back to sequential patching. In case there is again remaining packages after batch patching with lower batch size then do sequential patching. The code changes to calculate batch size in phase1 and phase2 will be done in later iterations once the code structure is approved.

(c) Use different value for average time to install packages for different package managers as per the telemetry data. Using current average time to install packages, it should improve after this change, and we can update the values later as per the improvement.

(d) In the maintenance window cutoff calculation, assumption of the number of packages in the batch which could take max time to install was 3, changing it to 1. Rest of the packages should take average time to install.

(e) In sequential patching, for the packages which failed to install in first attempt are retried 3 times. So, at max, there are total 4 attempts in sequential patching for a package. Now, with multi-phase batch patching implementation, we have two attempts in batch patching and 4 attempts in sequential patching. So, at max, there are total 6 attempts for a package. These are too many attempts. Each attempt to install package requires running two commands, one to install package and the other to check if package is installed successfully. These commands take up to total 1 second per package. These commands generate some logs which will add in telemetry throttling and slow down. To reduce attempts, removing 3 retries in sequential patching. So, there are total 2 attempts in batch patching and 1 in sequential patching.

Only adding function signatures and their caller from the code in the first iteration. Will implement the functions in later iterations.

Testing: Manual testing is not done as function implementation is pending. Test changes will be done once the product changes are approved.

Update: Function implementation for get_max_batch_size is done. Completed the manual testings. Fixed some issues in the code changes which were found during unit testing. Unit tests are not done. Will do it once product code changes are reviewed.

Manual testing done: (a) Done testing in all the 3 package managers: apt, yum and zypper.

(b) Tested the scenario in which multi-phase batch patching is done in only 1 phase and in 2 phases due to different conditions like remaining packages to install greater than 0 after phase 1 and maintenance window batch cutoff reached with phase 1 so need to do phase2. Also, tested that the packages remaining after phase 2 are attempted in sequential patching.

(c) Done testing for the case when number of packages to install is 0 in the beginning of the patch installation job.

codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 30.18868% with 37 lines in your changes are missing coverage. Please review.

Project coverage is 83.54%. Comparing base (8734572) to head (01ebd9d).

Files Patch % Lines
src/core/src/core_logic/PatchInstaller.py 11.42% 31 Missing :warning:
src/core/src/core_logic/MaintenanceWindow.py 60.00% 2 Missing :warning:
...ore/src/package_managers/AptitudePackageManager.py 66.66% 1 Missing :warning:
src/core/src/package_managers/PackageManager.py 66.66% 1 Missing :warning:
src/core/src/package_managers/YumPackageManager.py 66.66% 1 Missing :warning:
.../core/src/package_managers/ZypperPackageManager.py 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #250 +/- ## ========================================== - Coverage 90.95% 83.54% -7.41% ========================================== Files 91 91 Lines 15723 15749 +26 ========================================== - Hits 14301 13158 -1143 - Misses 1422 2591 +1169 ``` | [Flag](https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/250/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure) | Coverage Δ | | |---|---|---| | [python27](https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/250/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure) | `?` | | | [python39](https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/250/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure) | `83.54% <30.18%> (-7.41%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter commented 7 months ago

Codecov Report

Attention: Patch coverage is 98.16514% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 90.97%. Comparing base (b8d7a16) to head (4de68b4).

Files Patch % Lines
src/core/src/core_logic/PatchInstaller.py 98.24% 1 Missing :warning:
src/core/src/package_managers/PackageManager.py 66.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #250 +/- ## ========================================== - Coverage 90.98% 90.97% -0.01% ========================================== Files 91 91 Lines 15737 15799 +62 ========================================== + Hits 14318 14373 +55 - Misses 1419 1426 +7 ``` | [Flag](https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/250/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure) | Coverage Δ | | |---|---|---| | [python27](https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/250/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure) | `90.97% <98.16%> (-0.01%)` | :arrow_down: | | [python39](https://app.codecov.io/gh/Azure/LinuxPatchExtension/pull/250/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure) | `90.97% <98.16%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Azure#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

GAURAVRAMRAKHYANI commented 6 months ago

get_max_batch_size() looks good.

An observation on the changes in install_updates() in PatchInstaller.py, the function is getting longer, please extract them out to another function in your revised code changes.

Also since get_max_batch_size() is an independent function, we can have UTs to validate it. For any future revisions, please try to add UTs to test out the smaller units. Overall UT to validate E2E scenario can be added in the iteration that has all code changes

(a) I have addressed the comment that as install_updates is getting longer, extracted some part of it to another function. Did some manual testings and no issues found. Continuing doing more testing around this change.

(b) I have not yet implemented UT for get_max_batch_size(), will implement it tomorrow IST. Meanwhile, can you review the product changes?

kjohn-msft commented 6 months ago

Code cov is having a partial service outage. https://status.codecov.com/
Retry after the status shows green. @feng-j678 for review on code state as now.