coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
335 stars 165 forks source link

cmdlib: don't use cache qcow for composes; use virtiofs #3720

Open dustymabe opened 7 months ago

dustymabe commented 7 months ago

Now that our OSBuild workflow is using the cache we saw at least one case where the pipeline was running out of space. Since we had a previous proposal [1] to just drop the cahce altogether anyway let's try to at least remove it from the runcompose functions to eliminate the use of it there anyway.

[1] https://github.com/coreos/coreos-assembler/pull/3615

dustymabe commented 7 months ago

ok this worked (or rather didn't work because it was an invalid test) locally for me initially because I wasn't running in a VM (i.e. I was executing the privileged workflow). Setting FORCE_UNPRIVILEGED=1 I'm now testing this properly.

I added a commit to add xattr support for virtiofsd. I get farther now but then hit another error:

  zram-generator-1.1.2-8.fc39.x86_64 (fedora)
  zstd-1.5.5-4.fc39.x86_64 (fedora)
Input state hash: 03430d7c33a90d5213dbacd137402f272608196b50ff4172d80fb4313600ae84
error: cannot open Packages database in /proc/self/fd/25/usr/share/rpm
Skipping file /usr/bin/systemd-firstboot from checkout
Skipping file /usr/lib/systemd/system/systemd-firstboot.service from checkout
Skipping file /usr/lib/systemd/system/sysinit.target.wants/systemd-firstboot.service from checkout
Skipping file /usr/lib/systemd/system-generators/systemd-gpt-auto-generator from checkout
Skipping file /usr/etc/grub.d/08_fallback_counting from checkout
Skipping file /usr/etc/grub.d/10_reset_boot_success from checkout
Skipping file /usr/etc/grub.d/12_menu_auto_hide from checkout
Skipping file /usr/lib/systemd/ from checkout
Checking out packages...done
Checking out ostree layers...done
Running pre scripts...20 done
Running post scripts...done
error: While applying overrides for pkg shadow-utils: fchownat(usr/bin/chage): Operation not permitted
failed to execute cmd-build: exit status 1
dustymabe commented 7 months ago

It failed the same way in CI here.

openshift-ci[bot] commented 7 months ago

@dustymabe: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhcos 100198b5e77ca8a1eae7cb198f766942a0a45357 link true /test rhcos

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-merge-robot commented 6 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jlebon commented 6 months ago

I think the issue there is that the actual compose of the rootfs happens over virtiofs also because rpm-ostree wants to colocate it with the cache repo to get hardlinks. One thing we could do is put the virtiofs mount r/w in cosa fetch mode, but in cosa build mode, mount it read-only. Then rpm-ostree could detect that and put the workdir in e.g. /var/tmp (and lose hardlinks, so it'd be slower but meh...).

dustymabe commented 6 months ago

also because rpm-ostree wants to colocate it with the cache repo to get hardlinks

I'm not sure I understand this comment. Here I am modifying it to use runvm rather than runvm_with_cache, which means everything is happening over virtiofs IIUC. so the "cache repo" is also on virtiofs, right?

jlebon commented 6 months ago

also because rpm-ostree wants to colocate it with the cache repo to get hardlinks

I'm not sure I understand this comment. Here I am modifying it to use runvm rather than runvm_with_cache, which means everything is happening over virtiofs IIUC. so the "cache repo" is also on virtiofs, right?

Right. The compose just happens wherever the pkgcache repo is. Before (status quo), that was on the cache qcow2. Now, that's over virtiofs.

jlebon commented 6 months ago

So rpm-ostree does have support for e.g. applying filecaps at commit time, but it currently keys off of uid != 0 to know this, except that in the supermin VM we are root. And we do need to be root to e.g. do privileged stuff like mount namespaces. But it might work to just add a flag to force the commit modifier path even if uid == 0. E.g. we could try testing with

diff --git a/src/libpriv/rpmostree-core.cxx b/src/libpriv/rpmostree-core.cxx
index 9cc872b2..efb77107 100644
--- a/src/libpriv/rpmostree-core.cxx
+++ b/src/libpriv/rpmostree-core.cxx
@@ -3561,7 +3561,7 @@ apply_rpmfi_overrides (RpmOstreeContext *self, int tmprootfs_dfd, DnfPackage *pk
    *
    * TODO: For non-root `--unified-core` we need to do it as a commit modifier.
    */
-  if (getuid () != 0)
+  if (g_getenv ("RPMOSTREE_SKIP_RPMFI_OVERRIDES") || getuid () != 0)
     return TRUE; /* 🔚 Early return */

   g_auto (rpmfi) fi = NULL;

But there may be other things that break.

jlebon commented 4 months ago

I think this would be good to pick up again if it's not a lot of work to get working. But long-term, I think it'll get obsoleted by the move to deriving from a shared base image instead.