Closed djasa closed 2 years ago
When I tried to remove all layered packages with rpm-ostree reset
, I ended up with a system where sssd
couldn't start and the system thus couldn't have finished boot process and show at least terminal login.
I don't understand your issue. Can you double check that you're not impacted by https://github.com/fedora-silverblue/issue-tracker/issues/322? Thanks
Edit: I mis-read. I see the error in the last message.
Might be an rpm-ostree bug or a bug in the nss-tools package.
Can you double check that you're not impacted by https://github.com/fedora-silverblue/issue-tracker/issues/322?
Thanks. I found out about #322 only after filing this one. I'll give a try to workarounds mentioned there and either close this or (hopefully) add more diagnostic data (well to that end, hitting this on every active rpm-ostree operation and thus rpm-ostree ending all of them with non-zero code isn't exactly helfpul either)
@travier so I managed to update to 37.20220830.n.0
through rpm-ostree reset
and subsequent attempt to reinstall what I had layerd before (and a little fiddling with sssd
config so my primary account that is enterprise account works). Managed all that so if today silverblue deployment is free of #322, this issue is different one..
Also, ipa-client
& nss-tools
pair isn't the only affected, see:
# rpm-ostree install libguestfs
Checking out tree 067743c... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-25T20:47:06Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-08-29T09:52:53Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-08-30T06:55:42Z solvables: 2099
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-08-29T09:59:58Z solvables: 67321
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... done
error: Checkout iproute-tc-5.18.0-2.fc37.x86_64: Is a directory
EDIT: Ouch, this one hurts. :/ VMs are important part of my daily workflow, let's see how will fare usability of containerized boxes compared to plain libvirt to me...
# rpm-ostree install libvirt-daemon
Checking out tree 067743c... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-25T20:47:06Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-08-29T09:52:53Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-08-30T06:55:42Z solvables: 2099
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-08-29T09:59:58Z solvables: 67321
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... done
error: Checkout iproute-tc-5.18.0-2.fc37.x86_64: Is a directory
I have all of that installed on F36 right now so this might be an F37 only issue.
Can you try with the latest rpm-ostree from https://koji.fedoraproject.org/koji/search?terms=rpm-ostree-2022.13-1.fc37&type=build&match=glob ?
If this does not fix things, can you file an rpm-ostree bug? Thanks!
I tried in a fresh rawhide VM rebased to 37 and couldn't reproduce, then I realized I used default settings (btrfs) while system where the issue exhibits is on xfs, so I'm installing a new VM with xfs. Anyway I start suspecting some sort of errors somewhere in storage stack.
(just wondering if there is some easy way to save the mutations descriptions + /etc custom content somewhere, reinstall the immutable base and reapply back my saved stuff...)
So the primary issue was indeed fs (xfs) corruption. Getting FS to clean state including manual delete of one duplicated entry (had to flip coin which one...) transforms error of rpm-ostree install
to:
# rpm-ostree install ipa-client libvirt-daemon
Checking out tree e01b11d... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-26T07:01:43Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-09-04T10:24:06Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-09-05T07:11:47Z solvables: 4238
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-09-04T10:30:53Z solvables: 67287
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... doneerror: Checkout nss-tools-3.81.0-1.fc37.x86_64: Invalid NULL filename
It seems that for these cases, it'd be great if ostree could redeploy immutable parts of /sysroot
. After xfs_repair
followed by one of duplicate deletions, ostree prune
now doesn't complain and ostree fsck
tells something useful:
error: In commits 886a2cbdf8ffca6a687b5a52f1c1d4b63ef4e1f15adcd5326a1b575ac25542ce, 28d8c9ee8aed6ed93f594e96e484eebf9916541d5d2cbb0e0892405ff8417762, 3aabf2f092e3363bf6b7e0f4a4bf0bfd7e22a3fcb64060de2680766ce53a5295: fsck content object 274b41f592f10b398f57993a23caadbed7d782bd3a80cdb2bbcd021f0eb00066: Corrupted file object; checksum expected='274b41f592f10b398f57993a23caadbed7d782bd3a80cdb2bbcd021f0eb00066' actual='6a8d75dc00d3394e786d08e4acad28c6f62a72f0a11d70190c3a8dac756f8556'
Trying workaround with rebase, cleanup and rebase back...
In rebased system then after rpm-ostree cleanup --rollback
:
# ostree fsck
Validating refs...
Validating refs in collections...
Enumerating commits...
Verifying content integrity of 122 commit objects...
fsck objects (218966/218966) [=============] 100%
object fsck of 122 commits completed successfully - no errors found.
Rebasing back to 37
Here we go:
# rpm-ostree install ipa-client libvirt-daemon libguestfs libguestfs-xfs
Checking out tree e01b11d... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-26T07:01:43Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-09-04T10:24:06Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-09-05T07:11:47Z solvables: 4238
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-09-04T10:30:53Z solvables: 67287
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Will download: 131 packages (98.4 MB)
Downloading from 'updates-testing'... done
Downloading from 'fedora'... done
Importing packages... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Upgraded:
<upgraded pkgs>
Added:
<added pkgs>
Changes queued for next boot. Run "systemctl reboot" to start a reboot
So bug is indeed rpm-ostree
not handling well FS corruption stuff it could. :/ (not giving clues what is discardable of duplicated inodes, no easy option to download missing parts of "fs image". :/
Before I had idea of xfs_repair
and trying to rebase, I feared I'll have to reinstall the box entirely. :/
Thanks for the investigation. Can you file an rpm-ostree bug upstream?
Can you file an rpm-ostree bug upstream?
I'm thinking about several RFEs. Could you give me feedback on what of it is actually feasible within ostree
and rpm-ostree
ecosystem, or any relevant thoughts?
error detection and automatic handling. MIssing files or unexpected content such as in rhbz 2072897:
error: syscore cleanup: pruning: Pruning system repository: Deleting object 75dbfb354d07bc8a3420edbf9a8ec0c27575a8c176cc9726f8643a592c48c4e2.file: unlinkat(75/dbfb354d07bc8a3420edbf9a8ec0c27575a8c176cc9726f8643a592c48c4e2.file): Is a directory
should IMO instead of just generating a cryptic error trigger some of following:
/sysroot
FS for fsck
during next bootostree prune
(and use it's output also to detect broken blocks or something)rpm-ostree
which could user run after rpm-ostree status
would tell them that there are some errors aroundThere are checksums around IIUC, when they don't match, rpm-ostree should report it or do something with it right away...
Boot entries:
rpm-ostree reset
does but transient, just for that boot. This would have helped me work around problem of some misbehavior of layers blocking upgrade of base systemrpm-ostree
stack actually missing some component needed to spawn live system out of base layer, umounting the rpm-ostree filesystem sometimes during boot? Would it be feasible to implement? If so, this would allow the system to actually act as its own built-in rescue system to perform xfs_repair
on unmounted /sysroot
fs (and allow me to save several GB and tens of minutes of downloading some installation & live media)back up & restore mutations. Given that most of mutations should be pretty easy and resource-light to describe (boot changes, overlay of /etc
, list of layered packages are all rather small data), it should be possible to have some tool to gather these to an archive, have it (automatically, regularly) backed up and during or after clean install, have an option to load these changes from backup, shouldn't it?
think about some FS-independent way of having error detection and self-healing for the mutable parts of /sysroot
- /etc
and friends. So that some number of FS errors don't make an catastrophical failure or even loss of stuff such as ssh key.
(I'm assuming here that not all FS corruption comes from faulty disks and in case of faulty disks, vast majority of them are SSDs where few faulty cells are unlikely to be used again after redeploy-triggered copying and erasing of the blocks take place)
1
is better discussed upstream in ostree/rpm-ostree.
2.1
is https://github.com/fedora-silverblue/issue-tracker/issues/309, which is not easy to do: which settings (/etc) content do you pick when booting from this entry? If it's the latest then they might include stuff that is not available in this image. Moreover it would need to be update each time for each update. It would be kind of a shadow deployment. Unfortunately right now it's safer and simpler to keep a working pinned deployment.
2.2
is weird. The partitions should be fscked by systemd on mount. Maybe there's something else there. There might be kernel command line options read by systemd to to force fcsk &repair.
3
Feel free to make a project that backs up all of that :).
2.2
is weird. The partitions should be fscked by systemd on mount. Maybe there's something else there. There might be kernel command line options read by systemd to to force fcsk &repair.
Well it's inspired by my need to do manually some action beyond what xfs_repair
could do and doing it from full system with all the tools and option to browse the web before actually taking action are of tremendous help. For this case, use of default /etc
is IMO OK. (For sure with full disk encryption. For unencrypted root, it felt insecure to me until I realized that such systems can be hijacked by single unathorized boot even now.)
3
Feel free to make a project that backs up all of that :).
I'm pretty sure I wouldn't be able to pull it off, I'm glad I actually managed to diagnose the primary issue here. :) The idea is inspired e.g. by oVirt who do backups of their critical ovirt-engine machines this way - they create tarballs with critical parts of /etc
and with DB dump. In case of catastrophic failure, this is enough to turn freshly-installed machine into effective clone of the crashed one within a short while.
I'm pretty sure I wouldn't be able to pull it off, I'm glad I actually managed to diagnose the primary issue here. :)
Given the level of investigation you did here, I think you should give it a try! :)
1.
reported as rpm-ostree #3994
This issue tracker is intended only for Silverblue specific issues. We would like to ask you to try to reproduce the issue on a relevant Fedora Workstation release. If you will be able to reproduce there, then please report it in Red Hat Bugzilla (see How to file a bug) or in upstream (preferred for GNOME projects) and not in this issue tracker.
Describe the bug Any minor update or adjustment in layered packages is blocked for me for last weeks both in silverblue 36 and silverblue 37 with error below
To Reproduce Please describe the steps needed to reproduce the bug:
ipa-client
in layered packages (which presumably pulls nss* stuff)Expected behavior update passes
Screenshots If applicable, add screenshots to help explain your problem.
OS version:
Additional context This effectively pins system to a single given version. Do you want a minor update of 36? Rebase to 37, delete current deployment of 36, rebase to current deployment of 36. Or vice versa.