Azure / azhpc-images

Azure HPC/AI VM Images
MIT License
98 stars 78 forks source link

Enroot breaks in almalinux:almalinux-hpc:8_7-hpc-gen2:8.7.2024042601 #344

Closed xpillons closed 4 months ago

xpillons commented 4 months ago

User namespace is now disabled but required for enroot https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md#kernel-settings This is a breaking change for customers using enroot.

darkwhite29 commented 4 months ago

The reason why it's disabled:

https://github.com/Azure/azhpc-images/pull/297#issuecomment-2110651262

darkwhite29 commented 4 months ago

@xpillons We are working on a fix and will update this thread when it's done. Thanks for your patience.

xpillons commented 4 months ago

[like] Xavier Pillons reacted to your message:


From: Li Tan @.> Sent: Wednesday, May 15, 2024 8:27:02 PM To: Azure/azhpc-images @.> Cc: Xavier Pillons @.>; Mention @.> Subject: Re: [Azure/azhpc-images] Enroot breaks in almalinux:almalinux-hpc:8_7-hpc-gen2:8.7.2024042601 (Issue #344)

@xpillonshttps://github.com/xpillons We are working on a fix and will update this thread when it's done. Thanks for your patience.

— Reply to this email directly, view it on GitHubhttps://github.com/Azure/azhpc-images/issues/344#issuecomment-2113397355, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYCJIR3IUMCZS5HERSOSN3ZCPAJNAVCNFSM6AAAAABHWQWJI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGM4TOMZVGU. You are receiving this because you were mentioned.Message ID: @.***>

darkwhite29 commented 4 months ago

@xpillons as you mentioned in the email, there are two suggested mitigation methods: https://access.redhat.com/security/cve/cve-2023-32233

Previously we picked option 2 and you pointed out it breaks AI workloads.

We thus tried option 1, and unfortunately it doesn't work: the nf_tables module cannot be disabled, as part of the firewall. We have to stick to adopting option 1 for now to have this security issue addressed.

Future AlmaLinux-HPC image will be using base images with security patch applied, and this issue will not be present: https://www.linuxquestions.org/questions/linux-software-2/problems-with-unloading-kernelmodule-rocky-linux-8-and-9-a-4175725602/

Sorry for the inconvenience and thanks for understanding.

xpillons commented 4 months ago

Thank your for the analysis and testing, in the meantime waiting for the next release we have at least a workaround by reenabling Usernamespace.