coreos / coreos-installer

Installer for CoreOS disk images
https://coreos.github.io/coreos-installer/
Apache License 2.0
218 stars 92 forks source link

Update customize image test to fix ci #1478

Closed prestist closed 3 months ago

prestist commented 4 months ago

Seeing a failure regarding greps; need to find which one is failing; changed assert to see what is happening.

https://github.com/coreos/coreos-installer/pull/1474#issuecomment-2133485861

prestist commented 4 months ago

Cool new context; Comparing notes with the logs and the expected checks. The * are the asserts which are reporting incorrectly inside the check_dest()

[2024-05-28T17:57:22.885Z] 314: check_dest
[2024-05-28T17:57:22.885Z] 182: assert @applied-live-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-live-ign@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-live-ign@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-live-ign@' not found in log
[2024-05-28T17:57:22.885Z] 183: assert @applied-live-2-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-live-2-ign@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-live-2-ign@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-live-2-ign@' not found in log
[2024-05-28T17:57:22.885Z] 184: assert @applied-dest-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-dest-ign@ log
**[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-dest-ign@'\'' not found in log'**
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-dest-ign@' not found in log
[2024-05-28T17:57:22.885Z] 185: assert @applied-dest-2-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-dest-2-ign@ log
[2024-05-28T17:57:22.885Z] 149: echo 'Assertion passed: '\''@applied-dest-2-ign@'\'' found in log'
[2024-05-28T17:57:22.885Z] Assertion passed: '@applied-dest-2-ign@' found in log
[2024-05-28T17:57:22.885Z] 186: assert @preinst-1@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @preinst-1@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@preinst-1@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@preinst-1@' not found in log
[2024-05-28T17:57:22.885Z] 187: assert @preinst-2@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @preinst-2@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@preinst-2@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@preinst-2@' not found in log
[2024-05-28T17:57:22.885Z] 188: assert @postinst-1@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @postinst-1@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@postinst-1@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@postinst-1@' not found in log
[2024-05-28T17:57:22.885Z] 189: assert @postinst-2@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @postinst-2@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@postinst-2@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@postinst-2@' not found in log
[2024-05-28T17:57:22.885Z] 190: assert 'Adding "coreos-installer test certificate" to list of CAs'
[2024-05-28T17:57:22.885Z] 148: grep -Fq 'Adding "coreos-installer test certificate" to list of CAs' log
[2024-05-28T17:57:22.885Z] 149: echo 'Assertion passed: '\''Adding "coreos-installer test certificate" to list of CAs'\'' found in log'
[2024-05-28T17:57:22.885Z] Assertion passed: 'Adding "coreos-installer test certificate" to list of CAs' found in log

Expected Checks

check_dest() {
    ! assert @applied-live-ign@
    ! assert @applied-live-2-ign@
    assert @applied-dest-ign@
    assert @applied-dest-2-ign@
    ! assert @preinst-1@
    ! assert @preinst-2@
    ! assert @postinst-1@
    ! assert @postinst-2@
    assert 'Adding "coreos-installer test certificate" to list of CAs'
}
prestist commented 4 months ago

Okay, looking more closely

The dest.ign/dest.bu in the fixtures has some conditionals which are currently failing.

You can see that in the logs from #1474

2024-05-23T21:15:30.721Z] [ 3.332472] NetworkManager[631]: <warn> [1716498930.6791] keyfile: load: "/run/NetworkManager/system-connections/nmstate-json-eth1.nmconnection": failed to load connection: invalid connection: connection.interface-name: 'nmstate-json-eth1': interface name is longer than 15 characters

This needs to exist.. but does not so consequently we fail.

prestist commented 4 months ago

https://github.com/coreos/coreos-installer/issues/1479

HuijingHei commented 4 months ago

Get CI failed in check_dest, but no hint why dest-ignition-applied.service fails.

[2024-05-29T21:10:25.669Z] [FAILED] Failed to start dest-ignition-applied.service - Dest Ignition Applied.

[2024-05-29T21:10:25.669Z] See 'systemctl status dest-ignition-applied.service' for details.

...
[2024-05-29T21:10:26.686Z] Checking assertion: @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 229: assert @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 148: grep -Fq @applied-dest-ign@ log
[2024-05-29T21:10:26.686Z] 152: echo 'Assertion failed: '\''@applied-dest-ign@'\'' not found in log'
[2024-05-29T21:10:26.686Z] Assertion failed: '@applied-dest-ign@' not found in log
[2024-05-29T21:10:26.686Z] 153: return 1
[2024-05-29T21:10:26.686Z] 229: echo 'Failed assertion: @applied-dest-ign@'
[2024-05-29T21:10:26.686Z] Failed assertion: @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 229: return 1
prestist commented 4 months ago

@HuijingHei Is there a way to repo this locally? I am not super familiar with these image tests.

travier commented 4 months ago

Looks like it's running this script: https://github.com/coreos/coreos-installer/blob/main/.cci.jenkinsfile#L47

HuijingHei commented 4 months ago

grep failed with ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg in dest-ignition-applied.service dest.bu, no serial line in grub.cfg, if remove this line, CI passed.

Get grub.cfg content with add ExecStart=/bin/cat /boot/grub2/grub.cfg before grep, confirm that does not have serial content.

The problem is need to confirm if we still need the serial grep.

# This file is copied from https://github.com/coreos/coreos-assembler/blob/main/src/grub.cfg
# Changes:
#   - Dropped Ignition glue, that can be injected into platform.cfg
# petitboot doesn't support -e and doesn't support an empty path part
if [ -d (md/md-boot)/grub2 ]; then
  # fcct currently creates /boot RAID with superblock 1.0, which allows
  # component partitions to be read directly as filesystems.  This is
  # necessary because transposefs doesn't yet rerun grub2-install on BIOS,
  # so GRUB still expects /boot to be a partition on the first disk.
  #
  # There are two consequences:
  # 1. On BIOS and UEFI, the search command might pick an individual RAID
  #    component, but we want it to use the full RAID in case there are bad
  #    sectors etc.  The undocumented --hint option is supposed to support
  #    this sort of override, but it doesn't seem to work, so we set $boot
  #    directly.
  # 2. On BIOS, the "normal" module has already been loaded from an
  #    individual RAID component, and $prefix still points there.  We want
  #    future module loads to come from the RAID, so we reset $prefix.
  #    (On UEFI, the stub grub.cfg has already set $prefix properly.)
  set boot=md/md-boot
  set prefix=($boot)/grub2
else
  if [ -f ${config_directory}/bootuuid.cfg ]; then
    source ${config_directory}/bootuuid.cfg
  fi
  if [ -n "${BOOT_UUID}" ]; then
    search --fs-uuid "${BOOT_UUID}" --set boot --no-floppy
  else
    search --label boot --set boot --no-floppy
  fi
fi
set root=$boot

if [ -f ${config_directory}/grubenv ]; then
  load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi

if [ -f $prefix/console.cfg ]; then
  # Source in any GRUB console settings if provided by the user/platform
  source $prefix/console.cfg
fi

if [ x"${feature_menuentry_id}" = xy ]; then
  menuentry_id_option="--id"
else
  menuentry_id_option=""
fi

function load_video {
  if [ x$feature_all_video_module = xy ]; then
    insmod all_video
  else
    insmod efi_gop
    insmod efi_uga
    insmod ieee1275_fb
    insmod vbe
    insmod vga
    insmod video_bochs
    insmod video_cirrus
  fi
}

# Other package code will be injected from here
source $prefix/40_coreos-ignition.cfg
source $prefix/70_coreos-user.cfg
if [ x$feature_timeout_style = xy ] ; then
  set timeout_style=menu
  set timeout=1
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
  set timeout=1
fi

# Import user defined configuration
# tracker: https://github.com/coreos/fedora-coreos-tracker/issues/805
if [ -f $prefix/user.cfg ]; then
  source $prefix/user.cfg
fi

blscfg
HuijingHei commented 4 months ago

@HuijingHei Is there a way to repo this locally?

See Timothée's comment, what I did for testing:

# build coreos-installer first

$ cosa shell
$ sudo rm /usr/bin/coreos-installer
$ sudo rsync -rlv coreos-installer/install/usr/ /usr/

mkdir fcos; cd fcos
cosa init https://github.com/coreos/fedora-coreos-config.git
cosa fetch
# copy built coreos-installer to overrides/rootfs
cosa build
cosa buildextend-metal && cosa buildextend-metal4k && cosa buildextend-live --fast

cosa compress --fast --artifact=metal4k

# comment other tests and keep customize.sh in images.sh
COREOS_INSTALLER_TEST_INSTALLED_BINARY=1 tests/images.sh /srv/fcos/builds/latest/x86_64 customize.sh

Edit

prestist commented 4 months ago

@travier Thank you! @HuijingHei wow good job! okay thank you; I will repro it locally. and I will see about the serial bit. I really want to know what the sudden change is; It does not look like this has been touched in 2 years ~

HuijingHei commented 4 months ago

I really want to know what the sudden change is; It does not look like this has been touched in 2 years ~

Good question, not sure it is because the change of https://github.com/coreos/fedora-coreos-tracker/issues/1671

prestist commented 4 months ago

So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.

  1. update the grub.cfg here with
    
    serial --unit=0 --speed=115200 --word=8 --parity=no
    terminal_input serial
    terminal_output serial

2. remove the serial bit from the unit as @HuijingHei tested. 

I still cannot tell why it has become suddenly broken; as all changes seem to be from months or more ago but it is what it is. 
HuijingHei commented 4 months ago

So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.

  1. update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

I do not think this will work. We are building image using osbuild by default (see https://github.com/coreos/coreos-assembler/pull/3800) now, with bootupd --with-static-configs to write file /boot/grub2/grub.cfg, and it includes source console.cfg (see https://github.com/coreos/bootupd/pull/619), console.cfg was created by https://github.com/osbuild/osbuild/pull/1589, which was from https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml#L149, but it was not changed for 2 years, I am not sure when the change is.

cat console.cfg

# Any non-default console settings will be inserted here.
# CONSOLE-SETTINGS-START
serial --speed=115200
terminal_input serial console
terminal_output serial console
# CONSOLE-SETTINGS-END
prestist commented 4 months ago

So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.

  1. update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

I do not think this will work. We are building image using osbuild by default (see coreos/coreos-assembler#3800) now, with bootupd --with-static-configs to write file /boot/grub2/grub.cfg, and it includes source console.cfg (see coreos/bootupd#619), console.cfg was created by osbuild/osbuild#1589, which was from https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml#L149, but it was not changed for 2 years, I am not sure when the change is.

cat console.cfg

# Any non-default console settings will be inserted here.
# CONSOLE-SETTINGS-START
serial --speed=115200
terminal_input serial console
terminal_output serial console
# CONSOLE-SETTINGS-END

Wow, I had no idea about this. I need to read up on OSBuild. The error is consistent with serial not being in the grub config which lead me there, I need to research how the console.cfg gets layered into the grub.cfg

prestist commented 3 months ago

@HuijingHei so this makes sense, somthing must be responsable for running the grub2-mkconfig to create the new grub.cfg right?

Locally running fcos and cat grub.cfg | grep serial results in nil

From my understanding the grub.cfg should be updated with the composed information from the link you provided?

HuijingHei commented 3 months ago

Not quite sure about grub2-mkconfig, do we use it during installation?

Inspired by @mike-nguyen, build with legacy cosa using COSA_USE_OSBUILD=0 cosa build-xxx, the CI passed, get serial line in grub.cfg (with ExecStart=/bin/cat /boot/grub2/grub.cfg before grep), seems there might be something different with ISO with install ?

serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

when I check on qcow2.disk using cosa run, the serial line in /boot/grub2/grub.cfg is

serial --speed=115200
terminal_input serial console
terminal_output serial console
prestist commented 3 months ago

Not quite sure about grub2-mkconfig, do we use it during installation?

It would have to be before we are booted.

[core@cosa-devsh grub2]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
/usr/sbin/grub2-mkconfig: line 268: /boot/grub2/grub.cfg.new: Read-only file system

Inspired by @mike-nguyen, build with legacy cosa using COSA_USE_OSBUILD=0 cosa build-xxx, the CI passed, get serial line in grub.cfg (with ExecStart=/bin/cat /boot/grub2/grub.cfg before grep), seems there might be something different with ISO with install ?

Yeah I think so. I suspect that we have an issue going on there. : we could revert the change and go back to doing both until we know why ? It would get us passing without failing ci here.

serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

when I check on qcow2.disk using cosa run, the serial line in /boot/grub2/grub.cfg is

serial --speed=115200
terminal_input serial console
terminal_output serial console

Thats from the old build method right? well we know the change now. The builder is not doing grub2-mkconfig.

@jlebon wdyt we should do? does osbuilder have the ability to mount and run grub2-mkconfig ?

jlebon commented 3 months ago

The context here is that when building with OSBuild, how GRUB configs are set up is different. You can find more details in https://github.com/coreos/fedora-coreos-tracker/issues/1671 and the links there, but the TL;DR is that (1) a stub grub.cfg is written by bootupd which in turn sources console.cfg and (2) osbuild creates console.cfg in the org.osbuild.coreos.platform stage. coreos-installer was accordingly taught to write to console.cfg if found when using the --console option: https://github.com/coreos/coreos-installer/pull/1360, https://github.com/coreos/coreos-installer/pull/1423.

So... I think all we need here is

diff --git a/fixtures/customize/dest.bu b/fixtures/customize/dest.bu
index 1e17035..219422a 100644
--- a/fixtures/customize/dest.bu
+++ b/fixtures/customize/dest.bu
@@ -23,7 +23,7 @@ systemd:
         [Service]
         Type=oneshot
         RemainAfterExit=true
-        ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg
+        ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg /boot/grub2/console.cfg
         ExecStart=/bin/echo @applied-dest-ign@
         StandardOutput=tty

(and regenerate dest.ign as well).

I'll let one of you verify that this fixes the tests and open a PR for it since you did most of the work to investigate this (nice work!). :)

prestist commented 3 months ago

@jlebon Yep it works locally going to make a commit and push. Thank you!