Closed Fooze666 closed 1 year ago
From the rdsosreport
file, this looks concerning:
[ 130.564030] fedora dracut-initqueue[478]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[ 130.566165] fedora dracut-initqueue[478]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f39b869c0-a50f-422b-a804-3ba892c413db.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[ 130.566165] fedora dracut-initqueue[478]: [ -e "/dev/disk/by-uuid/39b869c0-a50f-422b-a804-3ba892c413db" ]
[ 130.566165] fedora dracut-initqueue[478]: fi"
Looks like the disk at /dev/disk/by-uuid/39b869c0-a50f-422b-a804-3ba892c413db
is not starting up successfully
That's what's strange. The disk obviously works. I'm using the system now to wrtie this. This behavior only happens if I try to use a newer deployment than the one I am currently running (listed above)
You might have to step through the commits in the ostree repo to find out where in the history things started breaking.
For example:
$ ostree refs | grep fedora
fedora:fedora/37/x86_64/silverblue
$ sudo ostree pull --commit-metadata-only --depth=10 fedora:fedora/37/x86_64/silverblue
GPG: Verification enabled, found 1 signature:
Signature made Wed 01 Mar 2023 09:22:05 PM EST using RSA key ID F55AD3FB5323552A
Good signature from "Fedora <fedora-37-primary@fedoraproject.org>"
Receiving metadata objects: 1/(estimating) 592 bytes/s 592 bytes
GPG: Verification enabled, found 1 signature:
Signature made Tue 28 Feb 2023 07:45:18 PM EST using RSA key ID F55AD3FB5323552A
Good signature from "Fedora <fedora-37-primary@fedoraproject.org>"
...
$ ostree log fedora:fedora/37/x86_64/silverblue
commit 1e8a1ad2c612d98ee1180cb34c58b7f916e5cd99fe2cec453b55828764ea56bd
Parent: 5edb0e8ec183ac2b2e3d7463a8383f38b62992ebdb7cbb4cdf15621ce36eec60
ContentChecksum: c488c4fe3501a124c6e415d5a3e048aeaeec78aca89890eb152c54b8624cdef5
Date: 2023-03-02 02:21:58 +0000
Version: 37.20230302.0
(no subject)
commit 5edb0e8ec183ac2b2e3d7463a8383f38b62992ebdb7cbb4cdf15621ce36eec60
Parent: 458a0650076eb79d6b34a34587638eec2437f1349bcfd8c59a1575c2c5bad87a
ContentChecksum: 7a6ebdcce0642e5aa2a6d5f10985f2732d7420df823a6dfc51d15b1b770ef3bf
Date: 2023-03-01 00:45:13 +0000
Version: 37.20230301.0
(no subject)
..
From there you can inspect the differences between the commits. Comparing your working commit with the one right after it:
$ rpm-ostree db diff 521691fcb10306daf330ad64fa04ad23097a05b5adaebe2f1ef4c170c679b9e8 1b159658aefe936a0b2f4415810fdf0e7c5cf79815d458e680fe1959379e995c
ostree diff commit from: 521691fcb10306daf330ad64fa04ad23097a05b5adaebe2f1ef4c170c679b9e8
ostree diff commit to: 1b159658aefe936a0b2f4415810fdf0e7c5cf79815d458e680fe1959379e995c
Upgraded:
conmon 2:2.1.5-1.fc37 -> 2:2.1.6-3.fc37
distribution-gpg-keys 1.82-1.fc37 -> 1.84-1.fc37
ibus-m17n 1.4.18-1.fc37 -> 1.4.19-1.fc37
podman 5:4.4.1-1.fc37 -> 5:4.4.1-3.fc37
podman-gvproxy 5:4.4.1-1.fc37 -> 5:4.4.1-3.fc37
rav1e-libs 0.5.1-6.fc37 -> 0.5.1-9.fc37
sane-backends 1.1.1-10.fc37 -> 1.2.1-1.fc37
sane-backends-drivers-cameras 1.1.1-10.fc37 -> 1.2.1-1.fc37
sane-backends-drivers-scanners 1.1.1-10.fc37 -> 1.2.1-1.fc37
sane-backends-libs 1.1.1-10.fc37 -> 1.2.1-1.fc37
Nothing stands out as a package that would cause a problem.
You could keep diffing the commits until you find a likely culprit. Or start trying to deploy each one to find the breakage:
$ sudo rpm-ostree deploy 1b159658aefe936a0b2f4415810fdf0e7c5cf79815d458e680fe1959379e995c
Validating checksum '1b159658aefe936a0b2f4415810fdf0e7c5cf79815d458e680fe1959379e995c'
1 metadata, 0 content objects fetched; 592 B transferred in 1 seconds; 0 bytes content written
⠁ Receiving objects; 98% (5417/5514) 11.4 MB/s 261.3 MB 1251 metadata, 4263 content objects fetched; 266145 KiB transferred in 24 seconds; 436.8 MB content written
Receiving objects; 98% (5417/5514) 11.4 MB/s 261.3 MB... done
Checking out tree 1b15965... done
...
Could you also share journalctl -b
from your system when it successfully boots?
I'm attaching the output of journalctl -b on a successful boot. I will try to step through the commits during the day today and see if I can find exactly where the problem begins. successfulboot.txt
Thanks for the successful boot logs. Stepping through the boot process from the failed + successful scenarios, I found this difference:
Successful:
Mar 03 05:31:35 fedora kernel: ata2: SATA link down (SStatus 0 SControl 300)
Mar 03 05:31:35 fedora kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 03 05:31:35 fedora kernel: ata1.00: ATA-10: WDC WD10SPZX-22Z10T1, 04.01A04, max UDMA/133
Mar 03 05:31:35 fedora kernel: ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA
Mar 03 05:31:35 fedora kernel: ata1.00: Features: NCQ-prio
Mar 03 05:31:35 fedora kernel: ata1.00: configured for UDMA/133
Mar 03 05:31:35 fedora kernel: scsi 0:0:0:0: Direct-Access ATA WDC WD10SPZX-22Z 1A04 PQ: 0 ANSI: 5
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: [sda] Write Protect is off
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 03 05:31:35 fedora kernel: sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
Mar 03 05:31:36 fedora kernel: sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8
Mar 03 05:31:36 fedora kernel: sd 0:0:0:0: [sda] Attached SCSI disk
Failed:
[ 2.119564] fedora kernel: ata2: SATA link down (SStatus 4 SControl 300)
[ 2.120201] fedora kernel: ata1: SATA link down (SStatus 4 SControl 300)
Further confirming that a change in the ostree content caused your disk to stop being discovered; though this appears to show that link to the disk is not being activated properly in the kernel. So my gut says there is a kernel that landed sometime after 2023-02-21 which may have changes that broke your environment.
After moving through the commits one by one, I get the following results:
The following 3 commits will boot successfully:
commit 3b45c7b29b62b40e2f4dfca4a4fabaab17edb321f569f7d40e060d72653dc91d Parent: 1b159658aefe936a0b2f4415810fdf0e7c5cf79815d458e680fe1959379e995c ContentChecksum: d4b9bbc40203bb0b6c5875f3a1e3051bee047a7114c4cad3cd78a7968c993687 Date: 2023-02-23 01:27:17 +0000 Version: 37.20230223.0 (no subject)
commit 1b159658aefe936a0b2f4415810fdf0e7c5cf79815d458e680fe1959379e995c Parent: 521691fcb10306daf330ad64fa04ad23097a05b5adaebe2f1ef4c170c679b9e8 ContentChecksum: 051ddc39e6fffebc2c3cb86ed4b295d6b76ab9ad7f1e030e502ff29472c4f2c1 Date: 2023-02-22 09:29:08 +0000 Version: 37.20230222.0 (no subject)
commit 521691fcb10306daf330ad64fa04ad23097a05b5adaebe2f1ef4c170c679b9e8 Parent: 43b8be80d5b3a367d2d4ea8ee2f3e4eff0bbf76c616c3af5c52246f104cb3562 ContentChecksum: 9007b1d64745219321d365f50408bc9cdb8af7cc183e732e689385a0242488b9 Date: 2023-02-21 00:44:37 +0000 Version: 37.20230221.0 (no subject)
When I try anything after the Feb 23 commit, I start getting the boot failures. I tried the following two commits and both fail to boot in the way I described in the original issue.
commit 025246791ade1e8e8b2975a48646295bbcb2e3d42744500de0d53f38fb213570 Parent: 06f3b434db34845102453d4b556f84ba664e982d5c933f164f2ca31985f352a9 ContentChecksum: cc08b4551540c3bf5a16386884ed02f95cc964e5f0fd1dc3334a187ae78e42cb Date: 2023-02-25 02:57:48 +0000 Version: 37.20230225.0 (no subject) Failure
commit 06f3b434db34845102453d4b556f84ba664e982d5c933f164f2ca31985f352a9 Parent: 3b45c7b29b62b40e2f4dfca4a4fabaab17edb321f569f7d40e060d72653dc91d ContentChecksum: 5089939d782d0681eab29e894975d3a9a4c871f2cb4c5fa11dc93417bf584337 Date: 2023-02-24 03:01:58 +0000 Version: 37.20230224.0 (no subject) Failure
I'm not sure what other information I can offer, but I'll gladly try to collect anything that might help with the troubleshooting.
$ rpm-ostree db diff 3b45c7b29b62b40e2f4dfca4a4fabaab17edb321f569f7d40e060d72653dc91d 06f3b434db34845102453d4b556f84ba664e982d5c933f164f2ca31985f352a9
ostree diff commit from: 3b45c7b29b62b40e2f4dfca4a4fabaab17edb321f569f7d40e060d72653dc91d
ostree diff commit to: 06f3b434db34845102453d4b556f84ba664e982d5c933f164f2ca31985f352a9
Upgraded:
kernel 6.1.12-200.fc37 -> 6.1.13-200.fc37
kernel-core 6.1.12-200.fc37 -> 6.1.13-200.fc37
kernel-modules 6.1.12-200.fc37 -> 6.1.13-200.fc37
kernel-modules-extra 6.1.12-200.fc37 -> 6.1.13-200.fc37
python-unversioned-command 3.11.1-3.fc37 -> 3.11.2-1.fc37
python3 3.11.1-3.fc37 -> 3.11.2-1.fc37
python3-libs 3.11.1-3.fc37 -> 3.11.2-1.fc37
That's the diff between the Feb 23 commit and the Feb 24 commit. The kernel
is in there and seems to give more support for my theory that the kernel
was involved.
You could look at the commits for the 6.1.13 kernel that fedora shipped here - https://lore.kernel.org/all/20230220133600.368809650@linuxfoundation.org/
...cross reference them here - https://gitlab.com/cki-project/kernel-ark/-/commits/linux-6.1.y
...which might help identify where it broke.
But I think the best thing to do now is file a BZ against the Fedora kernel with your evidence - https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=kernel
Thank you so much for your help @miabbott
I will try to organize everything and take this over to Bugzilla.
For the sake of anyone who might be having similar trouble and is following this thread, the issue has been filed through Bugzilla at this link: https://bugzilla.redhat.com/show_bug.cgi?id=2175529
Closing this one as things will happen in Bugzilla and there is not much for us to do here anymore.
Describe the bug Any update applied to my Fedora 37 Silverblue system since Feb 21 2023 fails to boot and drops to the dracut emergency shell. I have tried rebasing to F38 beta to see if it might help but experience the same behavior.
To Reproduce Please describe the steps needed to reproduce the bug:
Expected behavior Successful system boot after updating to new deployment
OS version:
Additional context The version listed above in the rpm-ostree output is the only deployment that will boot for me. Every update I've tried after that point fails to boot. I am attaching a copy of the rdsosreport.txt file generated by dracut. rdsosreport.txt