Closed bileslav closed 1 year ago
Thanks for the report.
The partition type looks good, assuming your hardware is x86_64.
Please enable booster debug logs with kernel parameter booster.log=debug,console
and post the output here.
Could you please also post the content of the efi variable you mentioned?
@anatol,
x86_64
yep
content of the efi variable
$ cat /sys/firmware/efi/efivars/LoaderDevicePartUUID-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f
0cf78ccc-c5ac-4683-9594-513f63c75220
booster debug logs
The variable looks fine, as well as the gpt uuid.
But it looks like booster is unable to find the EFI partition. I just pushed a change to add more debug logs. Hopefully it will give us an idea what is going on.
Please pull wip
branch, build init
and replace the /usr/lib/booster/init
.
@anatol,
Sure. By the way, I have one more EFI partition on /dev/sdd1 (Windows), and if I'm not mistaken it's enumerated before /dev/nvme0n1p1. Maybe this is the culprit?
Thanks for the debug log. I would expect to see handling gpt block device /dev/nvme0n1
along the following lines
[ 3.216961] booster: found a new device /dev/nvme0n1
[ 3.217635] booster: loading module aesni_intel
[ 3.218228] booster: udev event {Action:add KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 Env:map[ACTION:add DEVNAME:nvme0n1p2 DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 DEVTYPE:partition DISKSEQ:5 MAJOR:259 MINOR:2 PARTN:2 PARTNAME:Linux x86-64 root (/) SEQNUM:1779 SUBSYSTEM:block]}
[ 3.218951] booster: blkinfo for /dev/nvme0n1: type=gpt UUID=2788efc2-2f41-4e55-bd90-cd679952c330 LABEL=
[ 3.220830] booster: udev event {Action:add KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/ng0n1 Env:map[ACTION:add DEVNAME:ng0n1 DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/ng0n1 MAJOR:238 MINOR:0 SEQNUM:1780 SUBSYSTEM:nvme-generic]}
[ 3.221883] booster: symlink /dev/nvme0n1 /dev/disk/by-id/nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 3.221891] booster: found a new device /dev/nvme0n1p1
[ 3.221897] booster: found a new device /dev/nvme0n1p2
[ 3.222901] AVX2 version of gcm_enc/dec engaged.
[ 3.222937] AES CTR mode by8 optimization enabled
[ 3.223408] booster: udev event {Action:bind KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0 Env:map[ACTION:bind DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0 DRIVER:nvme MODALIAS:pci:v00002646d00005013sv00002646sd00005013bc01sc08i02 PCI_CLASS:10802 PCI_ID:2646:5013 PCI_SLOT_NAME:0000:01:00.0 PCI_SUBSYS_ID:2646:5013 SEQNUM:1781 SUBSYSTEM:pci]}
[ 3.224668] booster: blkinfo for /dev/nvme0n1p1: type=fat UUID=63599b60 LABEL=NO NAME
[ 3.225248] booster: blkinfo for /dev/nvme0n1p2: type=ext4 UUID=c1785551-9ae6-4a43-bab7-cdc42cf98971 LABEL=Arch Linux
but for some reason it is not there. It is weird, as if the block device is not really detected as a GPT. Or the function returns before this line https://github.com/anatol/booster/blob/5c41122d0297abbc0568a71a2a69850e37e12031/init/main.go#L210
Anyway I just added one more debug statement there to see what the code is doing there. The code is pushed to wip
branch. Please give it a try.
@anatol,
Thank you for your help. To me it seems like something is still missing :no_mouth:
Thank you for the logs. There is found a new device /dev/nvme0n1
but no corresponding block device info
message. Which is strange. It looks like the control flow leaves addBlockDevice()
early for some reason. But I would expect a warning in this case.
Anyway, lets add more debug statements to see what is the exact control flow. Please pull wip
branch, recompile it and then rebuild the image. Please provide the debug logs.
@anatol,
Apologies for the delay, here's a new log.
I also found that the auto-discovery works with mkinitcpio (using systemd hook).
Thank you for the logs @bileslav. Here is the most relevant part of the logs:
[ 3.127406] booster: udev event {Action:add KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 Env:map[ACTION:add DEVNAME:nvme0n1p1 DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 DEVTYPE:partition DISKSEQ:5 MAJOR:259 MINOR:1 PARTN:1 SEQNUM:1751 SUBSYSTEM:block]}
[ 3.127420] booster: found a new device /dev/nvme0n1
[ 3.128737] booster: issue https://github.com/anatol/booster/issues/222: 1
[ 3.130139] booster: udev event {Action:add KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 Env:map[ACTION:add DEVNAME:nvme0n1p2 DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2 DEVTYPE:partition DISKSEQ:5 MAJOR:259 MINOR:2 PARTN:2 PARTNAME:arch SEQNUM:1752 SUBSYSTEM:block]}
[ 3.130967] booster: blkinfo for /dev/nvme0n1: type=gpt UUID=0a298f81-a328-4992-977e-44ea94b84274 LABEL=
[ 3.131538] booster: block device info &{path:/dev/sdc2 symlinks:[] format:mbr isFs:false uuid:[0 0 0 0] label: hwPath: wwid:[] data:}
[ 3.132757] booster: issue https://github.com/anatol/booster/issues/222: 1
[ 3.134074] booster: udev event {Action:add KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/ng0n1 Env:map[ACTION:add DEVNAME:ng0n1 DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0/nvme/nvme0/ng0n1 MAJOR:238 MINOR:0 SEQNUM:1753 SUBSYSTEM:nvme-generic]}
[ 3.134749] booster: issue https://github.com/anatol/booster/issues/222: 2
[ 3.135602] booster: issue https://github.com/anatol/booster/issues/222: 3
[ 3.136413] booster: udev event {Action:bind KObj:/devices/pci0000:00/0000:00:01.1/0000:01:00.0 Env:map[ACTION:bind DEVPATH:/devices/pci0000:00/0000:00:01.1/0000:01:00.0 DRIVER:nvme MODALIAS:pci:v00002646d00005013sv00002646sd00005013bc01sc08i02 PCI_CLASS:10802 PCI_ID:2646:5013 PCI_SLOT_NAME:0000:01:00.0 PCI_SUBSYS_ID:2646:5013 SEQNUM:1754 SUBSYSTEM:pci]}
[ 3.136950] booster: issue https://github.com/anatol/booster/issues/222: 4: nvme-eui.0026b7282b60d145
[ 3.137482] booster: issue https://github.com/anatol/booster/issues/222: 5
[ 3.137999] booster: issue https://github.com/anatol/booster/issues/222: 6
[ 3.138490] booster: issue https://github.com/anatol/booster/issues/222: 7: {0 [193 42 115 40 248 31 17 210 186 75 0 160 201 62 201 59] [254 11 152 159 214 29 77 23 175 243 162 77 92 245 71 68] 0}
[ 3.138998] booster: issue https://github.com/anatol/booster/issues/222: 7: {1 [79 104 188 227 232 205 77 177 150 231 251 202 249 132 183 9] [94 26 54 18 33 233 79 44 173 224 231 76 147 0 182 7] arch 0}
[ 3.140169] booster: udev event {Action:bind KObj:/devices/pci0000:00/0000:00:01.2/0000:02:00.0/usb1/1-0:1.0 Env:map[ACTION:bind DEVPATH:/devices/pci0000:00/0000:00:01.2/0000:02:00.0/usb1/1-0:1.0 DEVTYPE:usb_interface DRIVER:hub INTERFACE:9/0/0 MODALIAS:usb:v1D6Bp0002d0604dc09dsc00dp01ic09isc00ip00in00 PRODUCT:1d6b/2/604 SEQNUM:1755 SUBSYSTEM:usb TYPE:9/0/1]}
[ 3.140788] booster: issue https://github.com/anatol/booster/issues/222: 4: nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 3.141763] booster: udev event {Action:bind KObj:/devices/pci0000:00/0000:00:01.2/0000:02:00.0/usb1 Env:map[ACTION:bind BUSNUM:001 DEVNAME:bus/usb/001/001 DEVNUM:001 DEVPATH:/devices/pci0000:00/0000:00:01.2/0000:02:00.0/usb1 DEVTYPE:usb_device DRIVER:usb MAJOR:189 MINOR:0 PRODUCT:1d6b/2/604 SEQNUM:1756 SUBSYSTEM:usb TYPE:9/0/1]}
[ 3.142271] booster: symlink /dev/nvme0n1 /dev/disk/by-id/nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 3.142276] booster: found a new device /dev/nvme0n1p1
[ 3.142280] booster: found a new device /dev/nvme0n1p2
For some reason addBlockDevice()
stops executing after if err := diskSymlink("id", devpath, wwid); err != nil {
as if there was some error. But in such case I expect the error message appear in the debug log. There is none. Or symlink
syscall stuck in kernel. Both situations are weird though.
Okay, adding more logs to that function. Hopefully it will provide more info about the errror.
@anatol,
Glad we're getting closer to the root. Kind of PHP echo-debugging though %)
So here is the relevant error message:
[ 3.362275] booster: issue https://github.com/anatol/booster/issues/222 /dev/nvme0n1: 4: nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 3.362718] booster: diskSymlink(/dev/nvme0n1->nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 3.363174] booster: issue https://github.com/anatol/booster/issues/222 /dev/nvme0n1: err: symlink /dev/nvme0n1 /dev/disk/by-id/nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 3.363641] booster: symlink /dev/nvme0n1 /dev/disk/by-id/nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
It looks like symlink
fails. The weird part is that the error message must include the OS error message, see the LinkError code:
func (e *LinkError) Error() string {
return e.Op + " " + e.Old + " " + e.New + ": " + e.Err.Error()
}
The message you shared does not correspond the golang sources :/
Other symlink syscalls also fail. Is there any specific about your initramfs configuration? Do you have any restrictions on the filesystem operations there?
@anatol,
The weird part is that the error message must include the OS error message. The message you shared does not correspond the golang sources?
I'm not familiar with golang. To me the message looks truncated as I don't see ) err: %v
anywhere. Apparently this is what you're talking about. It's really weird. I use go
for compilation.
By the way, I cleanly (re)installed Arch several times and once EndeavourOS since the first post, each time writing new GPT to /dev/nvme0n1
using gdisk
.
Other symlink syscalls also fail. Is there any specific about your initramfs configuration? Do you have any restrictions on the filesystem operations there?
The only difference between my booster.yaml
and the one in the first post is that ext4
is additionally forced, because this is the only way I found to save dmesg
output (mount
doesn't work otherwise).
I also tried without nvidia
modules.
@bileslav there is still not clear for me what is going on here.
Are you able to run booster's integration tests TestGptRootAutodiscovery*
, https://github.com/anatol/booster/blob/7e20561cd54da02ca0f03ca430ccdb78bb0475f7/tests/gpt_test.go#L142 are they all passed?
@anatol,
Are you able to run booster's integration tests...
I'm afraid I need some assistance on this. If the following commands is what I should run, then there's really A LOT of output and I'm not sure how to correctly share it with you + it takes quite a while.
cd /tmp
git clone https://github.com/anatol/booster.git
cd booster/tests
go test -v
You are close. But instead of running all integration tests, just run the autodiscovery tests, e.g. this one go test -v -run TestGptRootAutodiscoveryExt4
Thank you @anatol. Here's the output:
2023/07/08 23:18:20 Call 'go build' for init
2023/07/08 23:18:21 Call 'go build' for generator
=== RUN TestGptRootAutodiscoveryExt4
2023/07/08 23:18:21 Create booster.img with /tmp/3082948906/generator build --force --init-binary /tmp/3082948906/init --kernel-version 6.4.2-arch1-1 --config /tmp/TestGptRootAutodiscoveryExt42383010327/001/config.yaml /tmp/TestGptRootAutodiscoveryExt42383010327/001/booster.img
/tmp/TestGptRootAutodiscoveryExt42383010327/001/booster.img: 127878872 bytes
GPT fdisk (gdisk) version 1.0.9.1
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present
Creating new GPT entries in memory.
Command (? for help): This option deletes all partitions and creates a new protective MBR.
Proceed? (Y/N):
Command (? for help): Partition number (1-128, default 1): First sector (34-1023966, default = 2048) or {+-}size{KMGTP}: Last sector (2048-1023966, default = 1021951) or {+-}size{KMGTP}: Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): Changed type of partition to 'EFI system partition'
Command (? for help): Partition number (2-128, default 2): First sector (34-1023966, default = 411648) or {+-}size{KMGTP}: Last sector (411648-1023966, default = 1021951) or {+-}size{KMGTP}: Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): Changed type of partition to 'Linux x86-64 root (/)'
Command (? for help):
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): OK; writing new GUID partition table (GPT) to /dev/loop0.
The operation has completed successfully.
sudo: mkfs.fat: command not found
umount: : no mount point specified.
umount: : no mount point specified.
Device booster_auto_root is not active.
mount: /tmp/tmp.G51k9XyKJn: special device /dev/loop0p1 does not exist.
dmesg(1) may have more information after failed mount system call.
umount: /tmp/tmp.G51k9XyKJn: not mounted.
umount: : no mount point specified.
Device booster_auto_root is not active.
losetup: /dev/loop0: detach failed: No such device or address
default booster
timeout 0
title Booster
linux /vmlinuz-linux
initrd /booster-linux.img
options booster.log=debug printk.devkmsg=on console=ttyS0,115200 ignore_loglevel
mke2fs 1.47.0 (5-Feb-2023)
The file /dev/loop0p2 does not exist and no size was specified.
umount: /tmp/tmp.G51k9XyKJn: not mounted.
umount: : no mount point specified.
Device booster_auto_root is not active.
losetup: /dev/loop0: detach failed: No such device or address
mount: /tmp/tmp.OVlYfXcc6x: special device /dev/loop0p2 does not exist.
dmesg(1) may have more information after failed mount system call.
umount: /tmp/tmp.G51k9XyKJn: not mounted.
umount: /tmp/tmp.OVlYfXcc6x: not mounted.
Device booster_auto_root is not active.
losetup: /dev/loop0: detach failed: No such device or address
umount: /tmp/tmp.G51k9XyKJn: not mounted.
umount: /tmp/tmp.OVlYfXcc6x: not mounted.
Device booster_auto_root is not active.
losetup: /dev/loop0: detach failed: No such device or address
umount: /tmp/tmp.G51k9XyKJn: not mounted.
umount: /tmp/tmp.OVlYfXcc6x: not mounted.
Device booster_auto_root is not active.
losetup: /dev/loop0: detach failed: No such device or address
2023/07/08 23:18:23 QEMU command line: qemu-system-x86_64 -monitor unix:/tmp/vmtest1328454251/monitor.socket -serial unix:/tmp/vmtest1328454251/console.socket -no-reboot -nographic -display none -m 8G -smp 24 -enable-kvm -cpu host -bios /usr/share/ovmf/x64/OVMF.fd -device virtio-scsi-pci,id=scsi -drive format=raw,if=none,id=hd0,file=/tmp/TestGptRootAutodiscoveryExt42383010327/001/espdisk.raw -device scsi-hd,drive=hd0
BdsDxe: failed to load Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x4,0x0)/Scsi(0x0,0x0): Not Found
>>Start PXE over IPv4. gpt_test.go:150:
Error Trace: /tmp/bt/booster/tests/gpt_test.go:150
Error: Received unexpected error:
EOF
Test: TestGptRootAutodiscoveryExt4
2023/07/08 23:19:03 Got error while waiting for Qemu process completion: signal: killed
--- FAIL: TestGptRootAutodiscoveryExt4 (42.15s)
FAIL
exit status 1
FAIL github.com/anatol/booster/tests 42.645s
sudo: mkfs.fat: command not found
please make sure mkfs.fat is installed at your host.
@anatol, oh, I'm sorry. It seems good now:
So here are the results:
--- PASS: TestGptRootAutodiscoveryExt4 (4.73s)
--- PASS: TestGptRootAutodiscoveryLUKS (18.93s)
--- PASS: TestGptRootAutodiscoveryNoAuto (6.38s)
--- PASS: TestGptRootAutodiscoveryReadOnly (5.86s)
So the GPT autodiscovery in QEMU works fine. It is your hardware that for some reason fails. It is not clear why though.
@anatol, but how can it be that this message is cut off even before the closing brace? I just checked it again.
[ 3.362718] booster: diskSymlink(/dev/nvme0n1->nvme-KINGSTON_SKC3000S1024G_50026B7282B60D14
[ 5.341731] booster: diskSymlink(/dev/sdf->usb-_USB_DISK_2.0_007100B24090
On a related note, the symlink creation used for detecting root by WWID and it is a rarely used feature. There is no need to stop the boot process because of the link creation failed.
I made this error non-critical and pushed the update to wip
branch. Please give it a try.
@anatol, the auto-discovery works for me on the current wip
branch :+1:
The auto-discovery issue is fixed by https://github.com/anatol/booster/commit/1615f191131a45e0cc58826660b5c38f74b93544
Hi there,
Thank you very much for this project (oh, and for the Arch Linux!).
I can't figure out why the auto-discovery isn't working for me. No error, just timeout.
Using busybox I can see that LoaderDevicePartUUID and other variables are set.