Could os_patching report on packages requiring a specific version that are not pinned/held/versionlocked?

ldaneliukas commented 5 years ago

Affected Puppet, Ruby, OS and module versions/distributions

Puppet: 5.4.0
Ruby: 2.4.3p205
Distribution: CentOS
Module version: 0.11.1

How to reproduce (e.g Puppet code you use)

Add the following code to any Puppet profile/module (use any package you wish):

  package { 'rabbitmq-exporter':
    ensure  => '0.25.2-1'
  }

Run Puppet:

Notice: /Package[rabbitmq-exporter]/ensure: created

The /usr/local/bin/os_patching_fact_generation.sh will update the os_patching fact:

  warnings => {},
  package_updates => [
    "rabbitmq-exporter.x86_64"
  ],
  package_update_count => 1,

Running the os_patching task, will update the version of the package to the latest available version, regardless of the fact, that the version is explicitly set in Puppet itself.

  {
    "return": "Success",
    "reboot": "smart",
    "security": false,
    "message": "Patching complete",
    "packages_updated": {
      "rabbitmq-exporter-0.25.2-1.x86_64": "Updated"
    }

Due to this, the next usual puppet agent run on the machine, will downgrade the package back to the version set in the catalog:

Notice: /Package[rabbitmq-exporter]/ensure: ensure changed '0.29.0-1' to '0.25.2-1'

Which in turn will once again update the os_patching fact to package_update_count => 1 and will when ran, will update the same package - a never ending loop of upgrades/downgrades starts.

What behaviour did you expect instead

This being a Puppet task, i expected it will honor versions set by Puppet.

Any additional information you'd like to impart

I'm very happy that such a module and initiative exists, however, i'm puzzled as to how would one handle any packages installed via puppet and not set to installed or present which is quite dangerous to have in production. Additionaly, what about various modules that might come with certain package installes pinned to specific versions? I assume that these would also result in never ending loops.

albatrossflavour commented 5 years ago

Hi @LDaneliukas and thanks for opening the issue.

You've stumbled into one of the concerns I have with the package resource. You're lucky that the package in question is able to (at least in some way) cope with being patched and then enforced by puppet.

The right solution, irrespective of if you're using the os_patching module, is to ensure that any package that you have a specific version requirement for is also "pinned" or "version locked" through the OS tools.

I mention in the README that any pinned packages would show up in the pinned_packages => [], element of the os_patching fact.

The pinned packages entry lists any packages which have been specifically excluded from being patched, from version lock on Red Hat or by pinning in Debian.

I don't think there is much else we can do from a module perspective to assist with this, but happy to hear any thoughts you might have.

Thanks again

albatrossflavour commented 5 years ago

Since you're using a Redhat derivative, you can use the version lock defined type in the yum module

ldaneliukas commented 5 years ago

Thanks for the quick reply @albatrossflavour

Pinning packages makes sense and is the first thing that I tried, however, it seems rather overcomplicated. It's perfectly fine, for packages that are installed via hieradata that supplies the information to an underlying profile that utilizes the package resource - you can add versionlock to that automatically.

However, when you have profiles that use the package resource directly - that means, every single profile, has to also use versionlock too. The hardest of them all - modules, as this would mean reading the module code and creating a common versionlock profile to pin all the versions? This gets out of hand really quickly as you can see.

Would it be possible for os_patching_fact_generation to check the catalog for the node that it's running on and match any package resources that have specific versions with the data provided by yum for what needs to be upgraded and disregard those?

albatrossflavour commented 5 years ago

it's an interesting idea. It might be possible to do something, let me have a think about it. I'm not convinced that the module is the right place to handle it, but we might be able to flag potential issues.

albatrossflavour commented 5 years ago

We could put something like this in as an entry in the fact:

cat catalog.json | jq -r '.resources[] | select(.type == "Package") | select(.parameters.ensure | test ("[0-9].")) | [.title ,.parameters.ensure]'
[
  "jq",
  "1.5-1.el7"
]

Maybe we could match any entries found against the version lock list on the OS and flag any that aren't there in the warnings[] element of the fact.

It'd work for Redhat and Debian. I'll need to talk to @nathangiuliani to see if we could do something similar for Windows.

I'm still not convinced we should have this in the module but if we can make it work with relatively little pain, it might be useful.

I'm also going to change the title of the issue as it's a little misleading

ldaneliukas commented 5 years ago

Thanks for looking into this. Yes, the snipped you posted is exactly what I was aiming at. I haven't tested the behavior on Windows i.e. how it handles things such as chocolatey versions. I'll do that today and get back to you with any findings that might be of interest.

nathangiuliani commented 5 years ago

It's a bit different with Windows - we are installing OS updates, not updating versions of packages. Generally your approvals are done on an upstream system (e.g. WSUS). We're not dealing with chocolatey here at all.

If you're using the Microsoft update catalogue on the internet, you will just get whichever updates you chose based on the options the Windows update UI provides (e.g. whether or not to install recommended updates along with critical, it changes a bit in the newer versions).

The closest equivalent I can think of is that you can 'hide' updates, which prevents them from showing as required or being installed. This is the case at least in older versions of Windows (Server 2008 - 2012 R2) - I haven't checked on newer ones. I think our code will install them regardless, as I believe this 'hide' function is part of the UI only, and the APIs will effectively ignore it. It may be worth looking at whether or not we can incorporate this list of hidden updates, but to be honest, I've never seen it used - everyone who wants flexibility just uses WSUS and approves only the updates as per their requirements.

albatrossflavour commented 5 years ago

@LDaneliukas I've got a branch ( feature/new_facter_entries ) which should do what you're after. Can you take a look and let me know what you think. It was a little harder to get it working than I expected as I couldn't use jq so had to write a ruby one liner to do the magic.

ldaneliukas commented 5 years ago

I've ran this on Windows and I fully agree with @nathangiuliani that this is not an issue on Windows, since everything is managed via WSUS, hence, I don't think that the module needs to be concerned with it on that front.

Thanks for the quick code, I'll replicate the initial test scenario form which this issue has originated and come back with the results.

ldaneliukas commented 5 years ago

@albatrossflavour , can you create a PR for the changes? Unable to comment on the commit, but can't run it due to an error on https://github.com/albatrossflavour/puppet_os_patching/blob/e314c2d9e2879e7f24ae16d307b84e1e68906c0c/lib/facter/os_patching.rb#L57 this should probably be kblist.

Update

I've used commit 0ce7f3d1e689ec89db39678da75a22e4c05835f7 which doesnt' have the KB changes to test the pinned package handling.

Unfortunately, there seem to be issues with it:

  warnings => {},
  package_updates => [
    "rabbitmq-exporter.x86_64"
  ],
  package_update_count => 1,
  security_package_updates => [],
  security_package_update_count => 0,
  blocked => false,
  blocked_reasons => [],
  blackouts => {},
  pinned_packages => [
    "puppet
",
    "MySQL
",
    "MySQL"
  ],

The update count remains the same with the package as last time and the pinned_packages format looks out of place.

This can be due to the usage of puppet_client_datadir as that fact is not present on our systems. Doesn't it come from puppetlabs/puppetlabs-puppet_agent module (we don't use it)?

We could alternatively use puppet_vardir fact and append /client_data to it.

albatrossflavour commented 5 years ago

Give it another try

ldaneliukas commented 5 years ago

Had the chance to test this further and it seems we're almost there :+1:

One issue, could we change $(facter fqdn) here: https://github.com/albatrossflavour/puppet_os_patching/blob/6d73308379bbc9c65e95ea5c5eabf2e38d7eda74/files/os_patching_fact_generation.sh#L56 to $(puppet config print certname --section agent) since the fqdn does not always match certname of the machine which is used as the name for the catalog json. This would make the script more versatile.

Running it yields the following results:

{
  warnings => {
    packages_version_locked_in_catalog_but_not_on_os => [
      "puppet-agent",
      "exporter_exporter",
      "node-exporter",
      "rabbitmq-exporter"
    ]
  },
  package_updates => [
    "rabbitmq-exporter.x86_64"
  ],
  package_update_count => 1,
  missing_update_kbs => [],
  security_package_updates => [],
  security_package_update_count => 0,
  blocked => false,
  blocked_reasons => [],
  blackouts => {},
  pinned_packages => [
    "puppet",
    "MySQL",
    "MySQL"
  ],
  last_run => {
    date => "2019-08-12T10:03:47+02:00",
    message => "Patching complete",
    return_code => "Success",
    post_reboot => "smart",
    security_only => "false",
    job_id => "43"
  },
  patch_window => "dev1-teo-323",
  reboot_override => "default",
  reboots => {
    reboot_required => false,
    apps_needing_restart => {},
    app_restart_required => false
  }
}

warnings contains the packages with specific versions from the catalog, however, rabbitmq-exporter.x86_64 is still in package_updates despite it also being included in the warnings, hence, it's still being updated.

Should we exclude all packages that are os pinned and catalog pinned from package_updates and its count, so that we exclude such nodes when filtering the ones that have pending updates and so that we don't do the actual updates on these packages?

albatrossflavour commented 5 years ago

Great news. I'll fix up the fqdn. I'm not in favour of changing the way the logic works to exclude pinned packages. I would be OK having a boolean parameter that says "block patching if there are warnings", so it's down to the puppet admins to decide.

Thoughts?

ldaneliukas commented 5 years ago

Do you mean to block patching completely if there are warnings or just block upgrading the specific packages in the warnings? I'd say the latter would be perfectly fine as it'd solve the issue of the upgrade/downgrade cycle that we're seeing when specific versions are used in the Puppet catalog.

If we're talking about blocking the whole upgrade process - I'm not sure how this would solve the original issue as going and pinning all versions found in the warnings can sometimes be unrealistic, e.g. when the versions in the catalog come from different modules/etc. It'd be hard to keep track of these it's even more difficult when Puppet is offered as a service to developers that control their own code.

EDIT: Seeing as we are treating packages that are catalog locked, but not version locked by the OS as warnings, maybe we can introduce a parameter along the lines of --force-locks which would lock/unlock packages found in warnings before/after the run?

YUM: Either versionlock or the --exclude= parameter passed to yum upgrade.
APT/DPKG: hold and unhold
Zypper: addlock and removelock.

albatrossflavour commented 5 years ago

I was thinking blocking it completely. I'm thinking of ways of doing it though.

1) We exclude VSBNL (version specified but not locked) packages from patching using OS level tools during patching (yum --exclude XXX for example). That works for RHEL based, but not for Debian based (not checked SLES), since Debian only has a concept of hold or not hold, not exclude.

2) We have a flag that enables the auto-pinning of VSBNL packages. We use the info we gather and create locks for those packages. This works but we'd also have to mandate that we purge those files while we're in that mode.

3) We have an option that prevents any patching if there are warnings. This is the cleanest IMO as it avoids any potential issues and 'encourages' people to do the right thing if they want to patch their node.

4) We do nothing and let the patching happen.

My preference is 3,2,4 (1 isn't an option as it doesn't work everywhere). We could even combine 2, 3 & 4 into one option that says "if we have VSBNL packages should we prevent patching, lock them or continue patching regardless". Irrespective of which way we go, at least for the next few releases, the default will stay as "go ahead and patch" as I do not want to break the workflows for those using the module.

albatrossflavour commented 5 years ago

HAHAHA, crossover posting is always fun! I think I've answered your edit

ldaneliukas commented 5 years ago

Yeah, seems like we are on a similar thought path :smile:

I really like the idea of combining 2, 3 & 4 as then the warning section becomes a neat feature that allows admins to control the specific way that the patching is applied, which can mean a different choice being selected for prod/non-prod systems. I'm always for versatility :+1:

albatrossflavour commented 5 years ago

I guess the only thing I didn't think about is that combining the features means I have 3-5x the amount of work to do. I guess it's a good think I'm stuck in a foreign country that has mediocre internet and great craft beer.

albatrossflavour commented 5 years ago

@LDaneliukas are you on the puppet community slack?

albatrossflavour commented 5 years ago

I'm going to proceed with pushing the current code to dev. It seems to be working and has a couple of other features/fixes in it. Then I'll create a new branch to do this work on.

albatrossflavour commented 5 years ago

# facter -p os_patching.version_specified_but_not_locked_package_action
patch

and we're off.

ldaneliukas commented 5 years ago

@LDaneliukas are you on the puppet community slack?

Yep. I'll message you there.

# facter -p os_patching.version_specified_but_not_locked_package_action
patch

and we're off.

Wonderful, eager to try this out.

albatrossflavour commented 5 years ago

The idea of locking any package that has a version specified but is not locked is a good one, but I don't think it's going to work. There are too many moving parts between all of the different platforms.

I am toying with the idea of coming up with a version locking type/provider, but that's not something I'm going to be doing in the next few weeks.

I think the only way forward for what you're looking for is have a flag which will abort the patch run if there are warnings. You can then use the yum/apt argument parameter to exclude the packages.

It's not an idea solution but my gut says that this is a problem that will affect a small number of people and would require a substantial investment of time. The right way to fix it is to ensure that any package that needs to be on a specific version is also locked using the OS tools.

I will finish the work on having that flag in place.

nathangiuliani commented 5 years ago

My view is that executing os patching is a totally separate thing to version locking of packages, which is really what needs to be done to solve the original issue.

Perhaps the best compromise is to make mention of this in this module's documentation so that users are aware they should be looking into version locking if they don't want a patching run through this module to update their packages (which of course, is what would happen by manually patching anyway).

FWIW, I've had the same "issue" and version locking is our solution.

albatrossflavour commented 5 years ago

OK, code is updated.

 block_patching_on_warning => "true",
  blocked => true,
  blocked_reasons => [
    {
      package_versions_specified_but_not_locked => [
        "puppet-agent-6.4.3"
      ]
    }

If block_patching_on_warnings is set to true, it will change all warnings into blockers and prevent patching.

The default value is false.

albatrossflavour commented 5 years ago

Code complete, closing ticket

albatrossflavour / puppet_os_patching