Open edwardchalstrey1 opened 1 year ago
@JimMadge @craddm I have found a temporary solution to update management I think should be acceptable for the meantime. For most of the linux/windows VMs that make up both the SHM and SREs, we are able to update via "Update management center" which is a differnent thing in Azure from the Update management section of the shm-prod4-automation
"Automation account" we have set up.
The VMs that don't get updated this way are the compute VMs, which is for some reason to do with the image they use (see above), so my proposed temporary solution for these is to deploy a new SRD for the long-running SREs (e.g. edon). I assume that the new SRD will have the most recent linux updates, do you agree?
from here
Automatic VM guest patching, on-demand patch assessment and on-demand patch installation are supported only on VMs created from images with the exact combination of publisher, offer and sku from the below supported OS images list. Custom images or any other publisher, offer, sku combinations aren't supported. More images are added periodically.
If that is true I don't think this way of updating machines can work for us at all. It isn't feasible to build every SRD from base Ubuntu @jemrobinson. Did this not work before?
This is just another thing pushing me towards Ansible AWX...
The VMs that don't get updated this way are the compute VMs, which is for some reason to do with the image they use (see above), so my proposed temporary solution for these is to deploy a new SRD for the long-running SREs (e.g. edon). I assume that the new SRD will have the most recent linux updates, do you agree?
That is going to take a while and mean downtime and/or moving users to a new SRD. How often do updates need to be applied for DSPT?
I suspect it would be quicker and easier to run the equivalent commands on the SRDs as the admin user. That isn't ideal, but I don't think there is a reason you shouldn't do that. We would rely on 'safe people' there to have confidence you won't access any sensitive data.
If that is true I don't think this way of updating machines can work for us at all. It isn't feasible to build every SRD from base Ubuntu @jemrobinson. Did this not work before?
Sure, I don't propose abandoning the current way of doing things though, I'm seeing this as a temporary fix for those it does work for whilst we don't have a solution for https://github.com/alan-turing-institute/data-safe-haven/issues/1403
That is going to take a while and mean downtime and/or moving users to a new SRD.
It's pretty quick and won't result in any downtime, the new SRD won't have any changes beyond apps being closed so moving over to it shouldn't be an issue - it's accessed in the exact same way
How often do updates need to be applied for DSPT?
I don't know, who knows this? @harisood ?
I suspect it would be quicker and easier to run the equivalent commands on the SRDs as the admin user.
Happy to do this, but what are they? Bear in mind this is general updates for all linux/windows VMs
If that is true I don't think this way of updating machines can work for us at all. It isn't feasible to build every SRD from base Ubuntu @jemrobinson. Did this not work before?
Sure, I don't propose abandoning the current way of doing things though, I'm seeing this as a temporary fix for those it does work for whilst we don't have a solution for alan-turing-institute/data-safe-haven#1403
But that was the error message from the current way of handling updates no? It clearly says that it will not work for custom images.
That is going to take a while and mean downtime and/or moving users to a new SRD.
It's pretty quick and won't result in any downtime, the new SRD won't have any changes beyond apps being closed so moving over to it shouldn't be an issue - it's accessed in the exact same way
But you will either have to kick users off and kill jobs on the 'old' SRD, or shut it down before deploying a new one. Either way it is much more disruptive than updating the packages in situ.
Also I think a new SRD will only be as up to date as the VM image that is deployed. So unless you build a new image each time as well, the new SRD won't have newer packages.
I suspect it would be quicker and easier to run the equivalent commands on the SRDs as the admin user.
Happy to do this, but what are they? Bear in mind this is general updates for all linux/windows VMs
Isn't it is only the SRDs that need an alternative way to be updated?
apt update && apt upgrade -y
Also I think a new SRD will only be as up to date as the VM image that is deployed. So unless you build a new image each time as well, the new SRD won't have newer packages.
Ah ok, if this is the case then I agree (but btw, it's possible to deploy multiple SRD's (compute VMs) per SRE, so wouldn't have resulted in downtime, but maybe you're right about people running jobs).
Isn't it is only the SRDs that need an alternative way to be updated?
Yes you're right, ok if you think apt update && apt upgrade -y
is sufficient I'll run that on the SRD VMs in question
Also had to run:
sudo apt --fix-broken install -y
after
sudo apt update && sudo apt upgrade -y
because
The following packages have unmet dependencies:
nvidia-dkms-525 : Depends: nvidia-kernel-common-525 (<= 525.78.01-1) but it is not installed
Depends: nvidia-kernel-common-525 (>= 525.78.01) but it is not installed
nvidia-driver-525 : Depends: nvidia-kernel-common-525 (<= 525.78.01-1) but it is not installed
Depends: nvidia-kernel-common-525 (>= 525.78.01) but it is not installed
Recommends: libnvidia-compute-525:i386 (= 525.78.01-0ubuntu0.20.04.1)
Recommends: libnvidia-decode-525:i386 (= 525.78.01-0ubuntu0.20.04.1)
Recommends: libnvidia-encode-525:i386 (= 525.78.01-0ubuntu0.20.04.1)
Recommends: libnvidia-fbc1-525:i386 (= 525.78.01-0ubuntu0.20.04.1)
Recommends: libnvidia-gl-525:i386 (= 525.78.01-0ubuntu0.20.04.1)
nvidia-kernel-common-520 : Depends: nvidia-kernel-common-525 but it is not installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
A couple of things here.
Addressing some individual points below:
Answer seems to be no, error msg suggests it can't be done for VMs made with this particular image
Ubuntu 20.04 is one of the supported OSes at that link. That's what the SRD image you're using (20-04-2022112900
) is based off.
Custom images or any other publisher, offer, sku combinations aren't supported. If that is true I don't think this way of updating machines can work for us at all. It isn't feasible to build every SRD from base Ubuntu @jemrobinson. Did this not work before?
This used to work. The base image is a supported image (and if you look at a deployed VM in the portal Ubuntu 20.04 is still listed as the image name). Has it definitely stopped working? Have we confirmed this on a new deployment?
Yes you're right, ok if you think apt update && apt upgrade -y is sufficient I'll run that on the SRD VMs in question
The Automation Account update management is basically just running apt update
but managed by the portal rather than eg. a cronjob on the machine.
Check updates: https://learn.microsoft.com/en-us/azure/update-center/quickstart-on-demand#check-updates
This is for the "Update management center" which is not the solution we're trying to use here. If you look at the Automation Account, you can see that the problem is that the Automation Account doesn't see any Linux VMs as being registered with the Log Analytics workspace. It's not trying-and-failing to install updates, it isn't even seeing the VMs that updates need to be installed on. My guess is that there might be a network rule that's preventing this communication.
Step 1
Check updates: https://learn.microsoft.com/en-us/azure/update-center/quickstart-on-demand#check-updates
Works for some but not all
I selected the prod4 and edon subscriptions, the one's that failed the assessment were the compute VMs for sandbox (in prod4 sub) and edon
Step 2
For those it will let us, install one-time updates:
Works for some but not all
It refuses to do it for the VMs identified above, not surprising:
Can we change the update settings for the affected VMs?
See docs that were linked by the above error: https://learn.microsoft.com/en-gb/azure/update-center/manage-update-settings?tabs=manage-single-overview%2Cmanage-scale-overview#configure-settings-on-single-vm
Answer seems to be no, error msg suggests it can't be done for VMs made with this particular image
Step 3
For the VMs where the updates are allowed, monitor the progress:
Step 4
Although it definitely won't let us do a manual update (or even assessment) for the compute VMs,
one solution could be to just deploy a new SRD (which presumably will have the most recent Linux patches) until we have fixed https://github.com/alan-turing-institute/data-safe-haven/issues/1403so instead log into the serial console of the compute VM and do (see below):