Azure / azhpc-images

Azure HPC/AI VM Images
MIT License
95 stars 78 forks source link

yum commands result in Killed in `Almalinux` #214

Closed JPchico closed 1 year ago

JPchico commented 1 year ago

I was testing updating the image using yum and no matter what yum command I use the result is always Killed

$ sudo yum update
Failed to set locale, defaulting to C.UTF-8
AlmaLinux 8 - BaseOS                                                                                                                                                               4.2 MB/s | 5.2 MB     00:01    
AlmaLinux 8 - AppStream                                                                                                                                                            9.7 MB/s |  11 MB     00:01    
AlmaLinux 8 - Extras                                                                                                                                                                23 kB/s |  19 kB     00:00    
AlmaLinux 8 - PowerTools                                                                                                                                                           3.4 MB/s | 3.1 MB     00:00    
Azure Lustre Packages                                                                                                                                                              4.1 MB/s | 3.0 MB     00:00    
cuda-rhel8-x86_64                                                                                                                                                                  4.5 MB/s | 2.1 MB     00:00    
Extra Packages for Enterprise Linux 8 - x86_64                                                                                                                                     8.9 MB/s |  14 MB     00:01    
Killed

I tried with yum clean and then running the update again but no joy. I was wondering if this is a known issue.

JPchico commented 1 year ago

I re-deployed in a new VM and now I'm getting the following error

$ sudo yum update --skip-broken
Failed to set locale, defaulting to C.UTF-8
AlmaLinux 8 - BaseOS                                                                                                                                                               5.3 MB/s | 5.2 MB     00:00    
AlmaLinux 8 - AppStream                                                                                                                                                            9.7 MB/s |  11 MB     00:01    
AlmaLinux 8 - Extras                                                                                                                                                                25 kB/s |  19 kB     00:00    
AlmaLinux 8 - PowerTools                                                                                                                                                           3.5 MB/s | 3.1 MB     00:00    
Azure Lustre Packages                                                                                                                                                              5.9 MB/s | 3.0 MB     00:00    
cuda-rhel8-x86_64                                                                                                                                                                  4.8 MB/s | 2.1 MB     00:00    
Extra Packages for Enterprise Linux 8 - x86_64                                                                                                                                     6.9 MB/s |  14 MB     00:01    
InfluxDB Repository - RHEL 8                                                                                                                                                       195 kB/s |  50 kB     00:00    
Error: 
 Problem: package amlfs-lustre-client-2.15.1_24_gbaa21ca-4.18.0.425.13.1.el8.7-1.noarch requires kmod-lustre-client-4.18.0.425.13.1.el8.7 = 2.15.1_24_gbaa21ca, but none of the providers can be installed
  - cannot install the best update candidate for package amlfs-lustre-client-2.15.1_24_gbaa21ca-4.18.0.425.3.1.el8-1.noarch
  - package kmod-lustre-client-4.18.0.425.13.1.el8.7-2.15.1_24_gbaa21ca-1.el8.x86_64 is filtered out by exclude filtering
(try to add '--nobest' to use not only best candidate packages)

This seems very similar to the error in #202. Should one use the --nobest option as recommended? or is there something else that one should use?

Is an error similar to this present in the ubuntu based images also?

The reason I'm asking is that if I set this as a head node in a system it might be running for a while, and being able to update and set the latest security patches would be very nice.

@ltalirz do you have any ideas on this?

ltalirz commented 1 year ago

Hey @JPchico ,

the quick fix is to run sudo yum --exclude=kernel* --exclude=amlfs* update (you also don't want to update the kernel)

To make it permanent, add it to the /etc/yum.conf:

[main]
...
exclude=kernel* amlfs* 

Thanks to @edwardsp for the hints, he may be able to prepare a PR later this week

JPchico commented 1 year ago

Thanks @ltalirz ! that solves the problem :)

abhamidipati-msft commented 1 year ago

@JPchico is this the same image published by azhop? Please share the image URI

JPchico commented 1 year ago

Hi @abhamidipati0614 yes this is a azhop image, it is exactly the same than the one reported in #211 The information is the following

            "imageReference": {
                "publisher": "azhpc",
                "offer": "azhop-compute",
                "sku": "almalinux-8_7",
                "version": "latest",
                "exactVersion": "2023.0313.1406"
            },
    "plan": {
        "name": "almalinux-8_7",
        "publisher": "azhpc",
        "product": "azhop-compute"
    },
abhamidipati-msft commented 1 year ago

@JPchico I am not sure of your use case, the image almalinux:almalinux-hpc:8_6-hpc-gen2:8.6.2023022301 in the marketplace is the one prepared from this repository. Would it be possible to give it a try with this one? (if it fits your use case)

abhamidipati-msft commented 1 year ago

@JPchico this is fixed in our latest marketplace image. Following is the image information

{ "architecture": "x64", "offer": "almalinux-hpc", "publisher": "almalinux", "sku": "8-hpc-gen1", "urn": "almalinux:almalinux-hpc:8-hpc-gen1:8.7.2023060101", "version": "8.7.2023060101" }, { "architecture": "x64", "offer": "almalinux-hpc", "publisher": "almalinux", "sku": "8-hpc-gen2", "urn": "almalinux:almalinux-hpc:8-hpc-gen2:8.7.2023060101", "version": "8.7.2023060101" }, { "architecture": "x64", "offer": "almalinux-hpc", "publisher": "almalinux", "sku": "8_7-hpc-gen1", "urn": "almalinux:almalinux-hpc:8_7-hpc-gen1:8.7.2023060101", "version": "8.7.2023060101" }, { "architecture": "x64", "offer": "almalinux-hpc", "publisher": "almalinux", "sku": "8_7-hpc-gen2", "urn": "almalinux:almalinux-hpc:8_7-hpc-gen2:8.7.2023060101", "version": "8.7.2023060101" }