coreos / rpm-ostree

⚛📦 Hybrid image/package system with atomic upgrades and package layering
https://coreos.github.io/rpm-ostree
Other
841 stars 191 forks source link

Container image fails to build with 32-bit RPMs #4609

Open LukeShortCloud opened 9 months ago

LukeShortCloud commented 9 months ago

Host system details

I am on Fedora Workstation 38 trying to build Fedora Silverblue 38.

$ rpm-ostree --version
rpm-ostree:
 Version: '2023.7'
 Git: 06600cc4e215bed584ee71ff34bf5b78a181cab4
 Features:
  - rust
  - compose
  - container
  - fedora-integration

Actual behavior:

$ sudo rpm-ostree compose image --initialize --format=registry --cachedir=${WORKDIR}/cache fedora-silverblue-with-32-bit-packages.yaml docker.io/${DOCKER_HUB_USERNAME}/${DOCKER_HUB_CONTAINER}
...
error: Multiple installed 'NetworkManager-libnm' (NetworkManager-libnm-1:1.42.8-1.fc38.x86_64, NetworkManager-libnm-1:1.42.8-1.fc38.i686)
error: container-encapsulate failed: ExitStatus(unix_wait_status(256))

Expected behavior:

$ sudo rpm-ostree compose image --initialize --format=registry --cachedir=${WORKDIR}/cache fedora-silverblue-with-32-bit-packages.yaml docker.io/${DOCKER_HUB_USERNAME}/${DOCKER_HUB_CONTAINER}
...
fedora/38/x86_64/silverblue => 86a605f3d627d44bb6baef4a06dc6d464e356e14a48a310d364b1636b48c1a0f

Steps to reproduce it

I first encountered this issue trying to install Steam from RPMFusion. I have simplified being able to reproduce this by only installing the 32-bit NetworkManager-libnm package that is a dependency of steam.

git clone --branch f38 https://pagure.io/workstation-ostree-config.git
cd workstation-ostree-config
cat <<EOF > fedora-silverblue-with-32-bit-packages.yaml
---
include: fedora-silverblue.yaml
releasever: "38"
packages:
  - NetworkManager-libnm.i686
EOF

Building to a local tree repository (not a container image) works.

Commands:

export WORKDIR="/root/tmp"
sudo mkdir -p ${WORKDIR}/cache ${WORKDIR}/repo
sudo ostree --repo=${WORKDIR}/repo init --mode=archive-z2
sudo rpm-ostree compose tree --unified-core --cachedir=${WORKDIR}/cache --repo=${WORKDIR}/repo fedora-silverblue-with-32-bit-packages.yaml

Output:

Committing... done
Metadata Total: 19275
Metadata Written: 7315
Content Total: 19192
Content Written: 1761
Content Cache Hits: 86665
Content Bytes Written: 195441152
7315 metadata, 89691 content objects imported; 4.4 GB content written                                                               
fedora/38/x86_64/silverblue => 86a605f3d627d44bb6baef4a06dc6d464e356e14a48a310d364b1636b48c1a0f

Building a container image does NOT work.

Commands:

export WORKDIR="/root/tmp"
sudo mkdir -p ${WORKDIR}/cache ${WORKDIR}/repo
sudo ostree --repo=${WORKDIR}/repo init --mode=archive-z2
sudo rpm-ostree compose image --initialize --format=registry --cachedir=${WORKDIR}/cache fedora-silverblue-with-32-bit-packages.yaml docker.io/${DOCKER_HUB_USERNAME}/${DOCKER_HUB_CONTAINER}

Output:

Committing... done
Metadata Total: 19275
Metadata Written: 1539
Content Total: 19191
Content Written: 34
Content Cache Hits: 86666
Content Bytes Written: 159547174
1539 metadata, 5307 content objects imported; 0 bytes content written                                                               
Wrote commit: 349981df836e72c5c9d72f23a937fecba0a2ad86b10c986fbbd89625b9fd01c5
Reading packages... done
error: Multiple installed 'NetworkManager-libnm' (NetworkManager-libnm-1:1.42.8-1.fc38.x86_64, NetworkManager-libnm-1:1.42.8-1.fc38.i686)
error: container-encapsulate failed: ExitStatus(unix_wait_status(256))

Additional info:

A lot of 32-bit issues with rpm-ostree compose tree were addressed a few years ago. I wonder if the same fixes need to be applied to rpm-ostree compose image.

https://github.com/coreos/rpm-ostree/pull/3161

Would you like to work on the issue?

I am not familiar enough with the internals of rpm-ostree to be able to work on this myself.

LukeShortCloud commented 9 months ago

This issue seems to be specifically when a package of the same name but different architectures (x86_64 and i686, in this case) are installed together in a container image.

jordemort commented 2 months ago

I am also experiencing this, when trying to encapsulate a CentOS 9 container with both 64-bit and 32-bit glibc packages installed:

error: Multiple installed 'glibc' (glibc-2.34-83.el9.12.x86_64, glibc-2.34-83.el9.12.i686)
error: container-encapsulate failed: ExitStatus(unix_wait_status(256))

This doesn't happen when running rpm-ostree compose tree, only rpm-ostree compose image.

The error seems to be coming from here: https://github.com/coreos/rpm-ostree/blob/5dd7dc979d64c3c6c95bf575ac88ee9bdf79420f/src/libpriv/rpmostree-refts.cxx#L172

I'd hazard a guess that it's being called from here: https://github.com/coreos/rpm-ostree/blob/5dd7dc979d64c3c6c95bf575ac88ee9bdf79420f/rust/src/container.rs#L278

jordemort commented 2 months ago

Forgot to mention, I'm using rpm-ostree 2024.4 on CentOS Stream 9:

rpm-ostree:
 Version: '2024.4'
 Git: afd7ddfc32c44cac657e9cedf3ad90bacdf14bc3
 Features:
  - rust
  - compose
  - container
LukeShortCloud commented 1 month ago

Hey @antheas , I see in https://github.com/coreos/rpm-ostree/issues/4953 you mentioned that you had patched your rpm-ostree build with a workaround for this problem of installing 32-bit applications. Was it the exact solution that @jordemort proposed or something else? Any chance we can get this into a PR to fix rpm-ostree upstream?

antheas commented 1 month ago

I just commented out the check.

As this part of the code does not have the reach to know what is the architecture of each package, it can not be implemented properly. Therefore, it will always fail when there are two packages with the same name and different architecture.

This check is only valid when there is a package in the list twice, which 1) can not happen (?) and 2) will error out anyway because of duplicate files. As such I would remove the check.

Perhaps there should be extra logic to error out properly when there are duplicate files. Since I also faced an issue with the lutris dependencies, I think unixodbc carried by wine-core 32 bit. The error there was unclear, and it took me well over an hour to find out the package responsible. However, since I was only testing I just nixed the 32 bit packages and carried on.

From c3787d4b13aed3a25aa358d98f027ddda6304f3a Mon Sep 17 00:00:00 2001
From: antheas <git@antheas.dev>
Date: Thu, 9 May 2024 21:56:29 +0200
Subject: [PATCH] skip multiple packages check to avoid 32bit packages breaking

---
 src/libpriv/rpmostree-refts.cxx | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/libpriv/rpmostree-refts.cxx b/src/libpriv/rpmostree-refts.cxx
index 46f63621..cf83bae2 100644
--- a/src/libpriv/rpmostree-refts.cxx
+++ b/src/libpriv/rpmostree-refts.cxx
@@ -165,16 +165,17 @@ RpmTs::package_meta (const rust::Str name) const
     {
       // TODO: Somehow we get two `libgcc-8.5.0-10.el8.x86_64` in current RHCOS, I don't
       // understand that.
-      if (retval != nullptr)
-        {
-          auto nevra = header_get_nevra (h);
-          g_autofree char *buf
-              = g_strdup_printf ("Multiple installed '%s' (%s, %s)", name_c.c_str (),
-                                 retval->nevra ().c_str (), nevra.c_str ());
-          throw std::runtime_error (buf);
-        }
+      // if (retval != nullptr)
+      //   {
+      //     auto nevra = header_get_nevra (h);
+      //     g_autofree char *buf
+      //         = g_strdup_printf ("Multiple installed '%s' (%s, %s)", name_c.c_str (),
+      //                            retval->nevra ().c_str (), nevra.c_str ());
+      //     throw std::runtime_error (buf);
+      //   }

       retval = std::make_unique<PackageMeta> (h);
+      break;
     }
   if (retval == nullptr)
     g_assert_not_reached ();
-- 
2.45.0
LukeShortCloud commented 1 month ago

Thanks for the very informative insight @antheas ! Perhaps @cgwalters may also have some ideas here for the long-term solution. Maybe it is as simple as removing this check if it is redundant and not necessary.

antheas commented 1 month ago

Actually I take that back, part of the error is the nevra. So potentially a solution is refactoring the check to cut out the version and keep name, architecture. Just don't know if it's worth fixing instead of removing.

error: Multiple installed 'glibc' (glibc-2.34-83.el9.12.x86_64, glibc-2.34-83.el9.12.i686)

jordemort commented 19 hours ago

I was planning on doing a patch to throw the error only if the architecture is the same (by doing some ugliness with strrchr to pull it off the end of nevra) but I realized that wouldn't actually be correct, because what if instead of merely 2 duplicate packages, there were 3 or more? All of them would have to be compared against each other, which would significantly complicate the check.

I think it would be OK to drop the check entirely. RPM itself has built-in protections against duplicate packages being installed, so I think that if RPM allowed it to happen, it's reasonable to assume that it's valid. It appears that the layer chunking is done on the basis of nevra, which includes architecture, so I don't think allowing a multilib setup is going to break anything on that end.

I intend to prepare a patch that removes the check, and if it works well, I will submit it as a pull request.

antheas commented 17 hours ago

I went through all this and I can say it's a bit of a moot point.

I have a patchset for rpm ostree that fixes the 32 bit bugs so that you can reprocess an image that contains 32 bit packages.

However, creating a commit with rpm ostree will a lot of the time not be possible because there are multiple packages that of the same lib own the same /etc file, which causes rpm-ostree to panic. Normal dnf would just overwrite the file

I decided to just strip rpm-ostree entirely and use ostree-rs-ext directly to repackage an oci image into an ostree commit here: https://github.com/antheas/bazzite-upd

It works a lot better and saves a lot more space than rpm-ostree itself without any of these bugs. However, I'm dealing with a lot of permissions issues that happen because folder permissions and owners get stripped during the processing of the container.

I managed to get it into a point where it boots and works perfectly. However, when moving it to a GitHub Ubuntu action it broke again (sddm panics) and I'm looking into fixing that.

The git history for that repo contains the patch set for rpm ostree (which doesn't fix it panicking for duplicate etc files during committing treefiles; but does for rechunking)

jordemort commented 13 hours ago

@antheas Hm, do you have any more detail about what goes wrong with /etc? I ended up modifying things to pass the architecture into package_meta instead of removing the check, otherwise there was the chance that package_meta would return information about the wrong package. I also changed the compose-image.sh test to build an image with both 64-bit and 32-bit glibc in it. As you say, both packages appear to claim some of the same files in /etc, but rpm-ostree does not seem to choke on it with my patch.

https://github.com/coreos/rpm-ostree/pull/5014

antheas commented 3 hours ago

Yes, the library unixodbc when installed as both 32 bit and 64 bit causes an error about /etc/odbc.ini covered in this issue https://github.com/coreos/rpm-ostree/issues/4653

Probably happens because the hash of the file in both packages is different, otherwise OSTree would check it out once I suppose.

Happens before creating the commit, after the commit container-encapsulate (both as part of the image command and standalone) runs correctly.