containers / ai-lab-recipes

Examples for building and running LLM services and applications locally with Podman
Apache License 2.0
104 stars 106 forks source link

Cloudinit images fo `nvidia-bootc` and `amd-bootc` fails to `dnf isntall cloud-init` #431

Open Gregory-Pereira opened 5 months ago

Gregory-Pereira commented 5 months ago

First identified in this actions run. When calling make cloud VENDOR=amd ARCH=amd64 or make cloud VENDOR=amd ARCH=amd64 from the /training/cloud directory it will encounter the following error:

Writing manifest to image destination
STEP 2/3: RUN dnf -y install cloud-init &&     ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants &&     rm -rf /var/{cache,log} /var/lib/{dnf,rhsm}
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

cuda-rhel9-x86_64                               915 kB/s | 1.4 MB     00:01
Last metadata expiration check: 0:00:02 ago on Thu May  2 03:20:43 2024.
No match for argument: cloud-init
Error: Unable to find a match: cloud-init
Error: building at STEP "RUN dnf -y install cloud-init &&     ln -s ../cloud-init.target /usr/lib/systemd/system/default.target.wants &&     rm -rf /var/{cache,log} /var/lib/{dnf,rhsm}": while running runtime: exit status 1

make: *** [cloud] Error 1

I have since been able to confirm the issue on my local machine.

/cc @stefwalter @lmilbaum

stefwalter commented 5 months ago

What are the ... lines?

Gregory-Pereira commented 5 months ago

I have found the logs from my terminal history and updated the comment above

stefwalter commented 5 months ago

Can you use subscription-manager register --auto-attach on the machine running the build to solve the issue?

cgwalters commented 5 months ago

It looks to me like you may be trying to do this build on a default Github Actions runner? That won't have RHEL entitlements set up. It actually looks like someone was working on this in https://github.com/redhat-actions/common/pull/60

But...yes, I think it's probably better to have the workflow here operate on a RHEL host, which is what's happening in other PRs I believe.

Gregory-Pereira commented 5 months ago

I was originally trying to build these on a default Github actions runner, but I encountered the same issue building locally.

rhatdan commented 4 months ago

Is this still an issue?

Gregory-Pereira commented 4 months ago

AFAIK yes, I haven't gotten time to fix this yet with RSA + Summit. To summarize, we want to move these cloud-init images workflow to use terraform to provision small AWS instances based on a RHEL9 AMI, as well as registry that with subman to unblock the dnf install cloud-init. Due to the first restraint, this feature might wait untill we have migrated our infrastructure to use self-hosted github runners with a RHEL9 or Fedora image on the runner, or any of the other solutions being discussed that remove our dependency on our Github runners quota..