Open nyetsche opened 9 months ago
A workaround could be to try not upgrading the os in the build image process by setting this config option to false: https://docs.aws.amazon.com/parallelcluster/latest/ug/Build-v3.html#Build-v3-UpdateOsPackages
@hgreebe The documentation says that option is false by default.
Can an option be added to the image builder to NOT include lustre/fsx support at all? Many setups do not require it and it would make it way easier to support many custom AMIs as it is the biggest sticking point in version compatibility.
Hi @coderforlife ,
you're correct the UpdateOsPackages
is set to false
by default.
@hgreebe suggested @nyetsche to set it to false
because he said:
The initial AMI starts with RHEL-8.8 (I also tried 8.7, but is updated to RHEL 8.9 from the redhat-release RPM during build
and the UpdateOS
step would be executed ONLY when UpdateOsPackages
is set to true
. So this should have solved the issue for @nyetsche.
Anyway we tracked internally the feature to avoid installing FSx for lustre drivers and support updated kernels when the client is not yet available.
Enrico
My organization requires using RHEL8 (a supported OS) from the privately shared RedHat licensed base. We then use
pcluster build-image
to make it ready for ParallelCluster.The
pcluster build-image
task has started failing for us recently. The initial AMI starts withRHEL-8.8
(I also tried8.7
, but is updated toRHEL 8.9
from theredhat-release
RPM during build:That comes from the
UpdateOS
section of the playbook:The
yum -y update
brings the OS to all most recent packages, includingredhat-release
andkernel-*
.The failure occurs later, during a
kernel_module 'lnet'
: https://github.com/aws/aws-parallelcluster-cookbook/blob/v3.7.2/cookbooks/aws-parallelcluster-environment/resources/lustre/partial/_install_lustre_centos_redhat.rb#L36That is, there's no module in
/lib/modules/4.18.0-513.5.1.el8_9.x86_64
.The kernel matrix compability in this document https://docs.aws.amazon.com/fsx/latest/LustreGuide/install-lustre-client.html indeed doesn't mention
4.18.0-513
, and the upstream at https://downloads.whamcloud.com/public/lustre/latest-2.12-release/el8/client/ doesn't include it either. So I realize this is actually a Lustre packaging issue, but I'm not sure how to get in touch with the FSX Lustre team. Even so, it'd be great to have a workaround. Right now we can't use new AMIs for compute nodes.I'm unsure of the best way forward here - blacklist
redhat-release*
and/orkernel-*
frombuild-image
process? Ignore errors frommodprobe lnet
?