ScottG489 / conjob

Simple web interface to run containers as jobs or serverless functions
MIT License
11 stars 0 forks source link

Consider disabling unattended upgrades #40

Open ScottG489 opened 2 years ago

ScottG489 commented 2 years ago

Unattended upgrades have caused a few issues in the past.

Package installations would fail because the unattended upgrade was running apt which held a lock. One solution to this specific problem is that we could wait for the upgrade to finish.

It has introduce breaking changes to production. Specifically, the version of runc was upgraded to fix a security vulnerability. This is good, but it also caused the service to completely break. It would have been better if there was a test run on package upgrades first and then if that worked the production server could be upgraded

Performance of the server shortly after startup could be affected. I haven't seen this specifically, but if unattended upgrades are running upgrades right after the server starts up this could affect performance.

Here are a few relevant links:

https://github.com/ansible/ansible/issues/25414 https://stackoverflow.com/questions/45269225/ansible-playbook-fails-to-lock-apt/51919678#51919678 https://help.ubuntu.com/community/AutomaticSecurityUpdates https://unix.stackexchange.com/questions/463498/terminate-and-disable-remove-unattended-upgrade-before-command-returns https://unix.stackexchange.com/questions/342663/how-is-unattended-upgrades-started-and-how-can-i-modify-its-schedule/342674#342674 https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image

ScottG489 commented 2 years ago

This answer appears to solve the "root" problem because it disables unattended upgrades from running early in the boot sequence:

https://unix.stackexchange.com/a/471192/443928

However, after some thought the earliest we'd be able to run this is after Ansible has connected so it would be too late at that point.

This solution seems a little better for use in Ansible as it kills the service. This could be run at the beginning of the playbook and we would guarantee it would no longer be running. We could also pair that with uninstalling the unattended upgrade package with something like the following:

apt-get -y purge unattended-upgrades

Though we should verify this is all we need to uninstall.

Solutions

I think the kill approach is the best solution we could do with our current setup. At the start of the playbook it would make sure the process is no longer running and then after the uninstall it shouldn't be able to run anymore, though, I don't know how it ever would other than with a reboot which we currently don't really support anyways.

I think the root fix for this would be to create our own AMI which has unattended-upgrades not even be installed. The expectation then would be that it will not even be on the system on the first or subsequent startups and no future configurations would be needed. I think also if we had our own AMI we could potentially bake in a few other things we are currently doing in the playbook. However, this will require some extra thought.

ScottG489 commented 1 year ago

Ran into another issue where a security patch broke things.

I think it might be a good idea to split this two separate issues. One is that unattended upgrades sometimes cause issues where apt gets locked (though I haven't actually seen this in a while, at least on this project), and the other is that unattended upgrades introduce changes outside our normal change cycle.

The latter has been a bigger issue, but should be simpler to solve. We simply have to disable unattended upgrades and don't need to worry about stopping the service before a lock can occur.

We should do some research to find out what the best way is to disable unattended upgrades and make sure it runs as part of the ansible playbook.