Azure / azhpc-images

Azure HPC/AI VM Images
MIT License
95 stars 77 forks source link

Information sought on keeping the Ubuntu 22.04 HPC image patched? #372

Closed garymansellricardo closed 3 weeks ago

garymansellricardo commented 3 weeks ago

Hi, apologies for posting this here, but not sure where else to ask...

As a corporate company, we must keep our systems fully patched - and this includes our HPC environment.

I was using the MS Azure Marketplace image for HPC Ubuntu 22.04 LTS - but I am presuming this is not updated monthly with patches, is that correct? If it is updated regularly, what is the cadence, because I would be happy to settle for this if it is patched regularly?

I have tried building a custom image from this HPC Ubuntu 22.04 LTS and then updating it (apt upgrade & dist-upgrade) , but a load of HPC related packages (including the kernel) don't seem to update - they are marked as "held back" - why is this and what can I do to update the kernel (especially)?

Even if I force the kernel (5.15 1066) to update to 5.15 1070 which is available via apt - it still does not want to switch to using this kernel.

I presume that the HPC software is compiled/tied to the 5.15 1066 kernel version and it is prevented from running newer - is this correct?

What can I do (to meet corp. security patching policy) for when I run this HPC OS image - how do I keep it updated?

Thanks

Gary

darkwhite29 commented 3 weeks ago

Hi @garymansellricardo,

Thanks for raising this issue.

Generally, OS kernel updates break compatibility of HPC components, e.g., Lustre. In our HPC images, the kernel is excluded from updates for this reason.

Ubuntu 22.04: https://github.com/Azure/azhpc-images/blob/0b14bf6158ee5aaecf73f1e78ed9f0988bb722ed/ubuntu/ubuntu-22.x/ubuntu-22.04-hpc/install_prerequisites.sh#L5 AlmaLinux 8.7: https://github.com/Azure/azhpc-images/blob/master/alma/common/install_utils.sh#L66

We implement it this way, since lots of kernel dependencies are installed which are highly coupled to a specific kernel. Thus, kernel updates are not encouraged in our HPC images.

Our HPC image releasing cadence is quarterly. In the meantime, if we get flagged for security issues, we quickly apply the patch and release a hotfix in an adhoc fashion which can be done within a week or two.

What you need to do is just to use our latest HPC images. Or you may report security bugs (and patches, if any) to us. We will apply the fix and release the patched images.

garymansellricardo commented 3 weeks ago

@darkwhite29 - thanks for getting back to me, that is excellent information and very helpful as I think we should be OK to use your latest HPC images, rather than "rolling our own" custom images now.

Please may I ask that this info is added to one of the project Readme's (if it's not already and I missed it) so that others may benefit without needing to bother you again?

Please feel free to close this issue.

darkwhite29 commented 3 weeks ago

Thanks for your feedback. I have created a PR on this: https://github.com/Azure/azhpc-images/pull/374

Issue closed. Thanks again for your attention on this issue!