Rebasing Silverblue from Rawhide to 38 or 37 is consistently failing

AdamWill commented 1 year ago

Describe the bug Running rpm-ostree rebase fedora/38/x86_64/silverblue in an openQA test that tests rebasing is consistently failing with errors indicating some kind of corrupted object, or something along those lines.

To Reproduce Please describe the steps needed to reproduce the bug:

Install Fedora Rawhide Silverblue (this would probably also happen starting from 37)
Run rpm-ostree rebase fedora/38/x86_64/silverblue

Expected behavior It fails. The errors vary but are all of a kind that suggest the data is somehow corrupted, e.g. "Invalid compressed data", "Unexpected EOF", "Corrupted file object"...

Screenshots badostree

CC @nirik since this may be a releng problem.

AdamWill commented 1 year ago

Hmm. This also fails if I change the rebase target to 37: https://openqa.fedoraproject.org/tests/1761831#step/rpmostree_rebase/9 but the same test run on F38 updates, which also rebases to F37, is consistently passing: https://openqa.fedoraproject.org/tests/1761605#next_previous this suggests the problem is (somehow) in rebasing from Rawhide to anything, not in the actual bits on the server.

AdamWill commented 1 year ago

The first failure of this kind was at 2023-02-15 18:26, which suggests the new rpm-ostree in Rawhide can't be the cause as it was built later than that.

cgwalters commented 1 year ago

rpm-ostree rebase fedora/38/x86_64/silverblue

It's unlikely this is related to rpm-ostree; the failure is at the ostree layer. It should be reproducible with just ostree pull fedora/38/x86_64/silverblue - which doesn't even need to be done on a host system, in fact it reproduces quickly in a rawhide userspace container:

[walters@xenon ~]$ podman run --rm -ti --pull=newer quay.io/fedora/fedora:rawhide
[root@7d8e0c507b30 ~]# dnf -y install ostree fedora-repos-ostree
...
[root@7d8e0c507b30 ~]# ostree --repo=repo init --mode=bare-user
[root@7d8e0c507b30 ~]# ostree --repo=repo pull fedora:fedora/38/x86_64/silverblue

error: Remote "fedora" not found
[root@7d8e0c507b30 ~]# cat /etc/ostree/remotes.d/fedora.conf >> repo/config 
[root@7d8e0c507b30 ~]# ostree --repo=repo pull fedora:fedora/38/x86_64/silverblue

GPG: Verification enabled, found 1 signature:

  Signature made Fri Feb 17 08:02:06 2023 using RSA key ID 809A8D7CEB10B464
  Good signature from "Fedora <fedora-38-primary@fedoraproject.org>"
Receiving metadata objects: 304/(estimating) 269.7 kB/s 539.5 kB                                                                                                                                  
error: Corrupted dirtree object; checksum expected='2903205ab8f3a87bb49052ee101db4caee2292f9563dc288ec5651b6e0e96363' actual='394c14c2d8fb906aebee79a403c78ef14e824885cf393ea2bf72e98c279788f1'
[root@7d8e0c507b30 ~]#

The same problem for me doesn't reproduce in a fedora:37 container.

If I was a betting man, I'd bet on libcurl. But I have to context switch to other things at the moment.

cgwalters commented 1 year ago

(Of note though: with the container-native flow, we no longer use libcurl for OS updates; instead it's skopeo and hence the golang http stack which does HTTP)

AdamWill commented 1 year ago

Can you pick me a horse for the 4:55? :P There was indeed a new curl in Rawhide (only) right around when this started breaking:

Wed Feb 15 13:34:29 2023 curl-7.88.0-1.fc39 tagged into f39 by bodhi [still active]

(the few hours between would be accounted for by some backup in openQA and the fact this test takes quite a long time - it has to wait through the whole ostree/ostree-installer build process). There's another build today with a change listed as "- http2: set drain on stream end", so I'll see if that fixes it, and if not, I'll maybe file a bug there.

AdamWill commented 1 year ago

It looks like it does 🎉

fedora-silverblue / issue-tracker

Rebasing Silverblue from Rawhide to 38 or 37 is consistently failing #420