fedora-silverblue / issue-tracker

Fedora Silverblue issue tracker
https://fedoraproject.org/atomic-desktops/silverblue/
123 stars 3 forks source link

Any operation on basic system (except for rebase) blocked by: error: Checkout: [package]: Is a directory #342

Closed djasa closed 2 years ago

djasa commented 2 years ago

This issue tracker is intended only for Silverblue specific issues. We would like to ask you to try to reproduce the issue on a relevant Fedora Workstation release. If you will be able to reproduce there, then please report it in Red Hat Bugzilla (see How to file a bug) or in upstream (preferred for GNOME projects) and not in this issue tracker.

Describe the bug Any minor update or adjustment in layered packages is blocked for me for last weeks both in silverblue 36 and silverblue 37 with error below

To Reproduce Please describe the steps needed to reproduce the bug:

  1. have Silverblue with ipa-client in layered packages (which presumably pulls nss* stuff)
  2. do an update

Expected behavior update passes

Screenshots If applicable, add screenshots to help explain your problem.

OS version:

# rpm-ostree status -b
State: idle
AutomaticUpdates: check; rpm-ostreed-automatic.timer: inactive
BootedDeployment:
● fedora:fedora/37/x86_64/silverblue
                  Version: 37.20220821.n.0 (2022-08-21T08:04:17Z)
               BaseCommit: 717b0177744d55d4c5d9735d9bc91eee0d14f670b0f878753fb79f45c171443e
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
          LayeredPackages: ansible git gnome-boxes gnome-tweak-tool htop ipa-client krb5-workstation langpacks-cs libguestfs libguestfs-tools libvirt-client libvirt-nss openssl powertop tlp vim-enhanced virt-viewer

Additional context This effectively pins system to a single given version. Do you want a minor update of 36? Rebase to 37, delete current deployment of 36, rebase to current deployment of 36. Or vice versa.

[root@djasa-p1g2-sb ~]# rpm-ostree status
State: idle
AutomaticUpdates: check; rpm-ostreed-automatic.timer: inactive
Deployments:
● fedora:fedora/37/x86_64/silverblue
                  Version: 37.20220821.n.0 (2022-08-21T08:04:17Z)
               BaseCommit: 717b0177744d55d4c5d9735d9bc91eee0d14f670b0f878753fb79f45c171443e
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
          LayeredPackages: ansible git gnome-boxes gnome-tweak-tool htop ipa-client krb5-workstation langpacks-cs libguestfs libguestfs-tools libvirt-client libvirt-nss openssl powertop tlp vim-enhanced virt-viewer

  fedora:fedora/36/x86_64/silverblue
                  Version: 36.20220820.0 (2022-08-20T01:11:35Z)
               BaseCommit: 410f3504fd7f73e3c2a1b1fc070e605c82531a1d35868b3911e94d0e7f996436
             GPGSignature: Valid signature by 53DED2CB922D8B8D9E63FD18999F7CBF38AB71F4
          LayeredPackages: ansible git gnome-boxes gnome-tweak-tool htop ipa-client krb5-workstation langpacks-cs libguestfs libguestfs-tools libvirt-client libvirt-nss openssl powertop tlp vim-enhanced virt-viewer

AvailableUpdate:
        Version: 37.20220828.n.0 (2022-08-28T08:31:47Z)
         Commit: 764731339c70f81ce171a0b345dfe67647a65af727c5d18d97dfcb4d89a95bc0
   GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
  SecAdvisories: 1 important
           Diff: 67 upgraded, 1 added
[root@djasa-p1g2-sb ~]# rpm-ostree update
⠂ Receiving metadata objects: 0/(estimating) -/s 0 bytes... 
Receiving metadata objects: 0/(estimating) -/s 0 bytes... done
Checking out tree 2099df5... done
Inactive requests:
  gstreamer1-plugins-ugly-free (already provided by gstreamer1-plugins-ugly-free-1.20.3-2.fc37.x86_64)
  libfprint (already provided by libfprint-1.94.4-2.fc37.x86_64)
  fprintd (already provided by fprintd-1.94.2-3.fc37.x86_64)
  chrome-gnome-shell (already provided by chrome-gnome-shell-10.1-17.fc37.x86_64)
  fprintd-pam (already provided by fprintd-pam-1.94.2-3.fc37.x86_64)
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-25T20:47:06Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-08-28T10:05:40Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-08-28T18:46:08Z solvables: 1831
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-08-28T10:12:08Z solvables: 67321
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... done
error: Checkout nss-tools-3.81.0-1.fc37.x86_64: Is a directory
djasa commented 2 years ago

When I tried to remove all layered packages with rpm-ostree reset, I ended up with a system where sssd couldn't start and the system thus couldn't have finished boot process and show at least terminal login.

travier commented 2 years ago

I don't understand your issue. Can you double check that you're not impacted by https://github.com/fedora-silverblue/issue-tracker/issues/322? Thanks

Edit: I mis-read. I see the error in the last message.

travier commented 2 years ago

Might be an rpm-ostree bug or a bug in the nss-tools package.

djasa commented 2 years ago

Can you double check that you're not impacted by https://github.com/fedora-silverblue/issue-tracker/issues/322?

Thanks. I found out about #322 only after filing this one. I'll give a try to workarounds mentioned there and either close this or (hopefully) add more diagnostic data (well to that end, hitting this on every active rpm-ostree operation and thus rpm-ostree ending all of them with non-zero code isn't exactly helfpul either)

djasa commented 2 years ago

@travier so I managed to update to 37.20220830.n.0 through rpm-ostree reset and subsequent attempt to reinstall what I had layerd before (and a little fiddling with sssd config so my primary account that is enterprise account works). Managed all that so if today silverblue deployment is free of #322, this issue is different one..

Also, ipa-client & nss-tools pair isn't the only affected, see:

# rpm-ostree install libguestfs 
Checking out tree 067743c... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-25T20:47:06Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-08-29T09:52:53Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-08-30T06:55:42Z solvables: 2099
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-08-29T09:59:58Z solvables: 67321
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... done
error: Checkout iproute-tc-5.18.0-2.fc37.x86_64: Is a directory

EDIT: Ouch, this one hurts. :/ VMs are important part of my daily workflow, let's see how will fare usability of containerized boxes compared to plain libvirt to me...

# rpm-ostree install libvirt-daemon
Checking out tree 067743c... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-25T20:47:06Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-08-29T09:52:53Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-08-30T06:55:42Z solvables: 2099
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-08-29T09:59:58Z solvables: 67321
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... done
error: Checkout iproute-tc-5.18.0-2.fc37.x86_64: Is a directory
travier commented 2 years ago

I have all of that installed on F36 right now so this might be an F37 only issue.

travier commented 2 years ago

Can you try with the latest rpm-ostree from https://koji.fedoraproject.org/koji/search?terms=rpm-ostree-2022.13-1.fc37&type=build&match=glob ?

travier commented 2 years ago

If this does not fix things, can you file an rpm-ostree bug? Thanks!

djasa commented 2 years ago

I tried in a fresh rawhide VM rebased to 37 and couldn't reproduce, then I realized I used default settings (btrfs) while system where the issue exhibits is on xfs, so I'm installing a new VM with xfs. Anyway I start suspecting some sort of errors somewhere in storage stack.

(just wondering if there is some easy way to save the mutations descriptions + /etc custom content somewhere, reinstall the immutable base and reapply back my saved stuff...)

djasa commented 2 years ago

So the primary issue was indeed fs (xfs) corruption. Getting FS to clean state including manual delete of one duplicated entry (had to flip coin which one...) transforms error of rpm-ostree install to:

# rpm-ostree install ipa-client libvirt-daemon 
Checking out tree e01b11d... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-26T07:01:43Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-09-04T10:24:06Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-09-05T07:11:47Z solvables: 4238
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-09-04T10:30:53Z solvables: 67287
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Checking out packages... doneerror: Checkout nss-tools-3.81.0-1.fc37.x86_64: Invalid NULL filename

It seems that for these cases, it'd be great if ostree could redeploy immutable parts of /sysroot. After xfs_repair followed by one of duplicate deletions, ostree prune now doesn't complain and ostree fsck tells something useful:

error: In commits 886a2cbdf8ffca6a687b5a52f1c1d4b63ef4e1f15adcd5326a1b575ac25542ce, 28d8c9ee8aed6ed93f594e96e484eebf9916541d5d2cbb0e0892405ff8417762, 3aabf2f092e3363bf6b7e0f4a4bf0bfd7e22a3fcb64060de2680766ce53a5295: fsck content object 274b41f592f10b398f57993a23caadbed7d782bd3a80cdb2bbcd021f0eb00066: Corrupted file object; checksum expected='274b41f592f10b398f57993a23caadbed7d782bd3a80cdb2bbcd021f0eb00066' actual='6a8d75dc00d3394e786d08e4acad28c6f62a72f0a11d70190c3a8dac756f8556'

Trying workaround with rebase, cleanup and rebase back...

djasa commented 2 years ago

In rebased system then after rpm-ostree cleanup --rollback:

# ostree fsck
Validating refs...
Validating refs in collections...
Enumerating commits...
Verifying content integrity of 122 commit objects...
fsck objects (218966/218966) [=============] 100%
object fsck of 122 commits completed successfully - no errors found.

Rebasing back to 37

djasa commented 2 years ago

Here we go:

# rpm-ostree install ipa-client libvirt-daemon libguestfs libguestfs-xfs
Checking out tree e01b11d... done
Enabled rpm-md repositories: fedora-cisco-openh264 fedora-modular updates-modular updates-testing-modular updates-testing updates fedora updates-archive
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2022-08-26T07:01:43Z solvables: 4
rpm-md repo 'fedora-modular' (cached); generated: 2022-09-04T10:24:06Z solvables: 1456
rpm-md repo 'updates-modular' (cached); generated: 2022-08-09T18:08:16Z solvables: 0
rpm-md repo 'updates-testing-modular' (cached); generated: 2022-08-09T18:08:18Z solvables: 0
rpm-md repo 'updates-testing' (cached); generated: 2022-09-05T07:11:47Z solvables: 4238
rpm-md repo 'updates' (cached); generated: 2022-08-09T18:08:15Z solvables: 0
rpm-md repo 'fedora' (cached); generated: 2022-09-04T10:30:53Z solvables: 67287
rpm-md repo 'updates-archive' (cached); generated: 2022-02-11T15:19:10Z solvables: 0
Resolving dependencies... done
Will download: 131 packages (98.4 MB)
Downloading from 'updates-testing'... done
Downloading from 'fedora'... done
Importing packages... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Upgraded:
  <upgraded pkgs>
Added:
  <added pkgs>
Changes queued for next boot. Run "systemctl reboot" to start a reboot

So bug is indeed rpm-ostree not handling well FS corruption stuff it could. :/ (not giving clues what is discardable of duplicated inodes, no easy option to download missing parts of "fs image". :/

Before I had idea of xfs_repair and trying to rebase, I feared I'll have to reinstall the box entirely. :/

travier commented 2 years ago

Thanks for the investigation. Can you file an rpm-ostree bug upstream?

djasa commented 2 years ago

Can you file an rpm-ostree bug upstream?

I'm thinking about several RFEs. Could you give me feedback on what of it is actually feasible within ostree and rpm-ostree ecosystem, or any relevant thoughts?

  1. error detection and automatic handling. MIssing files or unexpected content such as in rhbz 2072897:

    error: syscore cleanup: pruning: Pruning system repository: Deleting object 75dbfb354d07bc8a3420edbf9a8ec0c27575a8c176cc9726f8643a592c48c4e2.file: unlinkat(75/dbfb354d07bc8a3420edbf9a8ec0c27575a8c176cc9726f8643a592c48c4e2.file): Is a directory

    should IMO instead of just generating a cryptic error trigger some of following:

    • mark the /sysroot FS for fsck during next boot
    • run ostree prune (and use it's output also to detect broken blocks or something)
    • redeploy (part of?) image
      • silently in background, or:
      • via some subcommand of rpm-ostree which could user run after rpm-ostree status would tell them that there are some errors around

    There are checksums around IIUC, when they don't match, rpm-ostree should report it or do something with it right away...

  2. Boot entries:

    • IIUC the package layering doesn't alter base image in any way, does it? If it indeed doesn't, it should be fairly easy to have an entry in grub to boot into image with no layers applied. Something like what rpm-ostree reset does but transient, just for that boot. This would have helped me work around problem of some misbehavior of layers blocking upgrade of base system
    • is rpm-ostree stack actually missing some component needed to spawn live system out of base layer, umounting the rpm-ostree filesystem sometimes during boot? Would it be feasible to implement? If so, this would allow the system to actually act as its own built-in rescue system to perform xfs_repair on unmounted /sysroot fs (and allow me to save several GB and tens of minutes of downloading some installation & live media)
  3. back up & restore mutations. Given that most of mutations should be pretty easy and resource-light to describe (boot changes, overlay of /etc, list of layered packages are all rather small data), it should be possible to have some tool to gather these to an archive, have it (automatically, regularly) backed up and during or after clean install, have an option to load these changes from backup, shouldn't it?

  4. think about some FS-independent way of having error detection and self-healing for the mutable parts of /sysroot - /etc and friends. So that some number of FS errors don't make an catastrophical failure or even loss of stuff such as ssh key.

    (I'm assuming here that not all FS corruption comes from faulty disks and in case of faulty disks, vast majority of them are SSDs where few faulty cells are unlikely to be used again after redeploy-triggered copying and erasing of the blocks take place)

travier commented 2 years ago

1 is better discussed upstream in ostree/rpm-ostree.

2.1 is https://github.com/fedora-silverblue/issue-tracker/issues/309, which is not easy to do: which settings (/etc) content do you pick when booting from this entry? If it's the latest then they might include stuff that is not available in this image. Moreover it would need to be update each time for each update. It would be kind of a shadow deployment. Unfortunately right now it's safer and simpler to keep a working pinned deployment.

2.2 is weird. The partitions should be fscked by systemd on mount. Maybe there's something else there. There might be kernel command line options read by systemd to to force fcsk &repair.

3 Feel free to make a project that backs up all of that :).

djasa commented 2 years ago

2.2 is weird. The partitions should be fscked by systemd on mount. Maybe there's something else there. There might be kernel command line options read by systemd to to force fcsk &repair.

Well it's inspired by my need to do manually some action beyond what xfs_repair could do and doing it from full system with all the tools and option to browse the web before actually taking action are of tremendous help. For this case, use of default /etc is IMO OK. (For sure with full disk encryption. For unencrypted root, it felt insecure to me until I realized that such systems can be hijacked by single unathorized boot even now.)

3 Feel free to make a project that backs up all of that :).

I'm pretty sure I wouldn't be able to pull it off, I'm glad I actually managed to diagnose the primary issue here. :) The idea is inspired e.g. by oVirt who do backups of their critical ovirt-engine machines this way - they create tarballs with critical parts of /etc and with DB dump. In case of catastrophic failure, this is enough to turn freshly-installed machine into effective clone of the crashed one within a short while.

travier commented 2 years ago

I'm pretty sure I wouldn't be able to pull it off, I'm glad I actually managed to diagnose the primary issue here. :)

Given the level of investigation you did here, I think you should give it a try! :)

djasa commented 2 years ago

1. reported as rpm-ostree #3994