Closed prestist closed 3 months ago
Cool new context; Comparing notes with the logs and the expected checks. The * are the asserts which are reporting incorrectly inside the check_dest()
[2024-05-28T17:57:22.885Z] 314: check_dest
[2024-05-28T17:57:22.885Z] 182: assert @applied-live-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-live-ign@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-live-ign@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-live-ign@' not found in log
[2024-05-28T17:57:22.885Z] 183: assert @applied-live-2-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-live-2-ign@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-live-2-ign@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-live-2-ign@' not found in log
[2024-05-28T17:57:22.885Z] 184: assert @applied-dest-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-dest-ign@ log
**[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-dest-ign@'\'' not found in log'**
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-dest-ign@' not found in log
[2024-05-28T17:57:22.885Z] 185: assert @applied-dest-2-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-dest-2-ign@ log
[2024-05-28T17:57:22.885Z] 149: echo 'Assertion passed: '\''@applied-dest-2-ign@'\'' found in log'
[2024-05-28T17:57:22.885Z] Assertion passed: '@applied-dest-2-ign@' found in log
[2024-05-28T17:57:22.885Z] 186: assert @preinst-1@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @preinst-1@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@preinst-1@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@preinst-1@' not found in log
[2024-05-28T17:57:22.885Z] 187: assert @preinst-2@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @preinst-2@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@preinst-2@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@preinst-2@' not found in log
[2024-05-28T17:57:22.885Z] 188: assert @postinst-1@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @postinst-1@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@postinst-1@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@postinst-1@' not found in log
[2024-05-28T17:57:22.885Z] 189: assert @postinst-2@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @postinst-2@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@postinst-2@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@postinst-2@' not found in log
[2024-05-28T17:57:22.885Z] 190: assert 'Adding "coreos-installer test certificate" to list of CAs'
[2024-05-28T17:57:22.885Z] 148: grep -Fq 'Adding "coreos-installer test certificate" to list of CAs' log
[2024-05-28T17:57:22.885Z] 149: echo 'Assertion passed: '\''Adding "coreos-installer test certificate" to list of CAs'\'' found in log'
[2024-05-28T17:57:22.885Z] Assertion passed: 'Adding "coreos-installer test certificate" to list of CAs' found in log
check_dest() {
! assert @applied-live-ign@
! assert @applied-live-2-ign@
assert @applied-dest-ign@
assert @applied-dest-2-ign@
! assert @preinst-1@
! assert @preinst-2@
! assert @postinst-1@
! assert @postinst-2@
assert 'Adding "coreos-installer test certificate" to list of CAs'
}
Okay, looking more closely
The dest.ign/dest.bu in the fixtures has some conditionals which are currently failing.
You can see that in the logs from #1474
2024-05-23T21:15:30.721Z] [ 3.332472] NetworkManager[631]: <warn> [1716498930.6791] keyfile: load: "/run/NetworkManager/system-connections/nmstate-json-eth1.nmconnection": failed to load connection: invalid connection: connection.interface-name: 'nmstate-json-eth1': interface name is longer than 15 characters
This needs to exist.. but does not so consequently we fail.
Get CI failed in check_dest
, but no hint why dest-ignition-applied.service
fails.
[2024-05-29T21:10:25.669Z] [FAILED] Failed to start dest-ignition-applied.service - Dest Ignition Applied.
[2024-05-29T21:10:25.669Z] See 'systemctl status dest-ignition-applied.service' for details.
...
[2024-05-29T21:10:26.686Z] Checking assertion: @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 229: assert @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 148: grep -Fq @applied-dest-ign@ log
[2024-05-29T21:10:26.686Z] 152: echo 'Assertion failed: '\''@applied-dest-ign@'\'' not found in log'
[2024-05-29T21:10:26.686Z] Assertion failed: '@applied-dest-ign@' not found in log
[2024-05-29T21:10:26.686Z] 153: return 1
[2024-05-29T21:10:26.686Z] 229: echo 'Failed assertion: @applied-dest-ign@'
[2024-05-29T21:10:26.686Z] Failed assertion: @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 229: return 1
@HuijingHei Is there a way to repo this locally? I am not super familiar with these image tests.
Looks like it's running this script: https://github.com/coreos/coreos-installer/blob/main/.cci.jenkinsfile#L47
grep
failed with ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg
in dest-ignition-applied.service
dest.bu, no serial line in grub.cfg, if remove this line, CI passed.
Get grub.cfg
content with add ExecStart=/bin/cat /boot/grub2/grub.cfg
before grep, confirm that does not have serial
content.
The problem is need to confirm if we still need the serial grep.
# This file is copied from https://github.com/coreos/coreos-assembler/blob/main/src/grub.cfg
# Changes:
# - Dropped Ignition glue, that can be injected into platform.cfg
# petitboot doesn't support -e and doesn't support an empty path part
if [ -d (md/md-boot)/grub2 ]; then
# fcct currently creates /boot RAID with superblock 1.0, which allows
# component partitions to be read directly as filesystems. This is
# necessary because transposefs doesn't yet rerun grub2-install on BIOS,
# so GRUB still expects /boot to be a partition on the first disk.
#
# There are two consequences:
# 1. On BIOS and UEFI, the search command might pick an individual RAID
# component, but we want it to use the full RAID in case there are bad
# sectors etc. The undocumented --hint option is supposed to support
# this sort of override, but it doesn't seem to work, so we set $boot
# directly.
# 2. On BIOS, the "normal" module has already been loaded from an
# individual RAID component, and $prefix still points there. We want
# future module loads to come from the RAID, so we reset $prefix.
# (On UEFI, the stub grub.cfg has already set $prefix properly.)
set boot=md/md-boot
set prefix=($boot)/grub2
else
if [ -f ${config_directory}/bootuuid.cfg ]; then
source ${config_directory}/bootuuid.cfg
fi
if [ -n "${BOOT_UUID}" ]; then
search --fs-uuid "${BOOT_UUID}" --set boot --no-floppy
else
search --label boot --set boot --no-floppy
fi
fi
set root=$boot
if [ -f ${config_directory}/grubenv ]; then
load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
load_env
fi
if [ -f $prefix/console.cfg ]; then
# Source in any GRUB console settings if provided by the user/platform
source $prefix/console.cfg
fi
if [ x"${feature_menuentry_id}" = xy ]; then
menuentry_id_option="--id"
else
menuentry_id_option=""
fi
function load_video {
if [ x$feature_all_video_module = xy ]; then
insmod all_video
else
insmod efi_gop
insmod efi_uga
insmod ieee1275_fb
insmod vbe
insmod vga
insmod video_bochs
insmod video_cirrus
fi
}
# Other package code will be injected from here
source $prefix/40_coreos-ignition.cfg
source $prefix/70_coreos-user.cfg
if [ x$feature_timeout_style = xy ] ; then
set timeout_style=menu
set timeout=1
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
set timeout=1
fi
# Import user defined configuration
# tracker: https://github.com/coreos/fedora-coreos-tracker/issues/805
if [ -f $prefix/user.cfg ]; then
source $prefix/user.cfg
fi
blscfg
@HuijingHei Is there a way to repo this locally?
See Timothée's comment, what I did for testing:
# build coreos-installer first
$ cosa shell
$ sudo rm /usr/bin/coreos-installer
$ sudo rsync -rlv coreos-installer/install/usr/ /usr/
mkdir fcos; cd fcos
cosa init https://github.com/coreos/fedora-coreos-config.git
cosa fetch
# copy built coreos-installer to overrides/rootfs
cosa build
cosa buildextend-metal && cosa buildextend-metal4k && cosa buildextend-live --fast
cosa compress --fast --artifact=metal4k
# comment other tests and keep customize.sh in images.sh
COREOS_INSTALLER_TEST_INSTALLED_BINARY=1 tests/images.sh /srv/fcos/builds/latest/x86_64 customize.sh
Edit
@travier Thank you! @HuijingHei wow good job! okay thank you; I will repro it locally. and I will see about the serial bit. I really want to know what the sudden change is; It does not look like this has been touched in 2 years ~
I really want to know what the sudden change is; It does not look like this has been touched in 2 years ~
Good question, not sure it is because the change of https://github.com/coreos/fedora-coreos-tracker/issues/1671
So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial
2. remove the serial bit from the unit as @HuijingHei tested.
I still cannot tell why it has become suddenly broken; as all changes seem to be from months or more ago but it is what it is.
So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.
- update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no terminal_input serial terminal_output serial
I do not think this will work.
We are building image using osbuild by default (see https://github.com/coreos/coreos-assembler/pull/3800) now, with bootupd --with-static-configs
to write file /boot/grub2/grub.cfg
, and it includes source console.cfg
(see https://github.com/coreos/bootupd/pull/619), console.cfg
was created by https://github.com/osbuild/osbuild/pull/1589, which was from https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml#L149, but it was not changed for 2 years, I am not sure when the change is.
cat console.cfg
# Any non-default console settings will be inserted here.
# CONSOLE-SETTINGS-START
serial --speed=115200
terminal_input serial console
terminal_output serial console
# CONSOLE-SETTINGS-END
So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.
- update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no terminal_input serial terminal_output serial
I do not think this will work. We are building image using osbuild by default (see coreos/coreos-assembler#3800) now, with
bootupd --with-static-configs
to write file/boot/grub2/grub.cfg
, and it includessource console.cfg
(see coreos/bootupd#619),console.cfg
was created by osbuild/osbuild#1589, which was from https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml#L149, but it was not changed for 2 years, I am not sure when the change is.cat console.cfg # Any non-default console settings will be inserted here. # CONSOLE-SETTINGS-START serial --speed=115200 terminal_input serial console terminal_output serial console # CONSOLE-SETTINGS-END
Wow, I had no idea about this. I need to read up on OSBuild. The error is consistent with serial not being in the grub config which lead me there, I need to research how the console.cfg gets layered into the grub.cfg
@HuijingHei so this makes sense, somthing must be responsable for running the grub2-mkconfig to create the new grub.cfg right?
Locally running fcos and cat grub.cfg | grep serial
results in nil
From my understanding the grub.cfg should be updated with the composed information from the link you provided?
Not quite sure about grub2-mkconfig
, do we use it during installation?
Inspired by @mike-nguyen, build with legacy cosa using COSA_USE_OSBUILD=0 cosa build-xxx
, the CI passed, get serial line in grub.cfg
(with ExecStart=/bin/cat /boot/grub2/grub.cfg
before grep), seems there might be something different with ISO with install
?
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial
when I check on qcow2.disk using cosa run
, the serial line in /boot/grub2/grub.cfg
is
serial --speed=115200
terminal_input serial console
terminal_output serial console
Not quite sure about
grub2-mkconfig
, do we use it during installation?It would have to be before we are booted.
[core@cosa-devsh grub2]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg /usr/sbin/grub2-mkconfig: line 268: /boot/grub2/grub.cfg.new: Read-only file system
Inspired by @mike-nguyen, build with legacy cosa using
COSA_USE_OSBUILD=0 cosa build-xxx
, the CI passed, get serial line ingrub.cfg
(withExecStart=/bin/cat /boot/grub2/grub.cfg
before grep), seems there might be something different withISO with install
?Yeah I think so. I suspect that we have an issue going on there. : we could revert the change and go back to doing both until we know why ? It would get us passing without failing ci here.
serial --unit=0 --speed=115200 --word=8 --parity=no terminal_input serial terminal_output serial
when I check on qcow2.disk using
cosa run
, the serial line in/boot/grub2/grub.cfg
isserial --speed=115200 terminal_input serial console terminal_output serial console
Thats from the old build method right? well we know the change now. The builder is not doing grub2-mkconfig.
@jlebon wdyt we should do? does osbuilder have the ability to mount and run grub2-mkconfig ?
The context here is that when building with OSBuild, how GRUB configs are set up is different. You can find more details in https://github.com/coreos/fedora-coreos-tracker/issues/1671 and the links there, but the TL;DR is that (1) a stub grub.cfg
is written by bootupd which in turn sources console.cfg
and (2) osbuild creates console.cfg
in the org.osbuild.coreos.platform
stage. coreos-installer was accordingly taught to write to console.cfg
if found when using the --console
option: https://github.com/coreos/coreos-installer/pull/1360, https://github.com/coreos/coreos-installer/pull/1423.
So... I think all we need here is
diff --git a/fixtures/customize/dest.bu b/fixtures/customize/dest.bu
index 1e17035..219422a 100644
--- a/fixtures/customize/dest.bu
+++ b/fixtures/customize/dest.bu
@@ -23,7 +23,7 @@ systemd:
[Service]
Type=oneshot
RemainAfterExit=true
- ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg
+ ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg /boot/grub2/console.cfg
ExecStart=/bin/echo @applied-dest-ign@
StandardOutput=tty
(and regenerate dest.ign
as well).
I'll let one of you verify that this fixes the tests and open a PR for it since you did most of the work to investigate this (nice work!). :)
@jlebon Yep it works locally going to make a commit and push. Thank you!
Seeing a failure regarding greps; need to find which one is failing; changed assert to see what is happening.
https://github.com/coreos/coreos-installer/pull/1474#issuecomment-2133485861