coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
330 stars 165 forks source link

Kola Custom Test #3786

Closed sergeiwaigant closed 2 months ago

sergeiwaigant commented 2 months ago

Feature Request

I am having the task in a project to implement a pipeline for building Fedora CoreOS with K3s and SELinux at the moment. One subtask is to also implement some custom tests and I came along Kola, but I am not able to find a good documentation of how to do that.

Basically I want to be able to check for specific files within the build artifact and the version of the binaries within. Maybe someone can add some details of how to achieve this.

Desired Feature

Document how to implement custom tests in a pipeline - e.g. Jenkins or GitLab

Example Usage

Other Information

jbtrystram commented 2 months ago

kola external tests may be suitable for you ? https://coreos.github.io/coreos-assembler/kola/external-tests/

sergeiwaigant commented 2 months ago

Thanks for the hint @jbtrystram

I tried this as in the noop example, but its running into a timeout. I've created the noop file in tests/kola/basic and its executable.

Any idea whats wrong?

 podman run --rm -ti --security-opt=label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap=1001:1001:64536 -v=/root/fcos/fcos:/srv/ --device=/dev/kvm --device=/dev/fuse --tmpfs=/tmp -v=/var/tmp:/var/tmp --name=cosa -e COSA_NO_KVM=1 -v=/root/fcos:/corp:ro quay.io/coreos-assembler/coreos-assembler:fcos-39.20240225.3.0 kola run -p qemu --exttest /corp 'ext.corp.*'
kola run -p qemu --exttest /corp ext.corp.* --output-dir tmp/kola
⏭️  Skipping kola test pattern "fcos.internet":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
⏭️  Skipping kola test pattern "podman.workflow":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
=== RUN   ext.corp.basic.noop
2024-05-01T12:28:22Z kola: Test timed out. Adding as candidate for rerun success: ext.corp.basic.noop
--- FAIL: ext.corp.basic.noop (601.66s)
        harness.go:106: TIMEOUT[10m0s]: SSH unsuccessful within allotted timeframe for 6b402447-e55c-4608-800d-749e44a1d05d.
FAIL, output in tmp/kola
Error: harness: test suite failed
2024-05-01T12:28:22Z cli: harness: test suite failed
failed to execute cmd-kola: exit status 1
jbtrystram commented 2 months ago

Looking at the volume mount are your sure the test file is at the correct location? Also, look at tmp/kola for some more logs

sergeiwaigant commented 2 months ago

Yes, the path is correct like this. I am giving COREOS_ASSEMBLER_CONTAINER_RUNTIME_ARGS="-v=${_repoRoot}:/corp:ro" and its able to find the test.

Thank you for the hint with the tmp/kola logs... Could have read the message myself ❌

Please find below the console.txt ... console.txt

Looks like its able to start the image but its hanging at the stage below...

[   99.311728] systemd[1]: Startup finished in 25.008s (kernel) + 0 (initrd) + 1min 14.326s (userspace) = 1min 39.335s.
[   99.374326] systemd[1]: systemd-sysusers.service: Consumed 1.112s CPU time.

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

Press Enter for maintenance
(or press Control-D to continue): 
dustymabe commented 2 months ago

look further back in the logs:

[ TIME ] Timed out waiting for device dev-d…t.device - /dev/disk/by-label/root.
jlebon commented 2 months ago

Hi @sergeiwaigant,

It sounds like you want to roll your own builds. Have you had a look at the layering work? That should be a shorter path to success that doesn't require you to re-implement a lot of the work done in FCOS. See https://fedoraproject.org/wiki/Changes/OstreeNativeContainerStable and e.g. https://containers.github.io/bootc/intro.html. bootc is not currently in the production streams, so you'll have to e.g. RUN rpm-ostree install -y bootc.

sergeiwaigant commented 2 months ago

look further back in the logs:

[ TIME ] Timed out waiting for device dev-d…t.device - /dev/disk/by-label/root.

Thanks, I will check this... currently trying to build on FCOS 40 to check if the problem persists there as well.

Hi @sergeiwaigant,

It sounds like you want to roll your own builds. Have you had a look at the layering work? That should be a shorter path to success that doesn't require you to re-implement a lot of the work done in FCOS. See https://fedoraproject.org/wiki/Changes/OstreeNativeContainerStable and e.g. https://containers.github.io/bootc/intro.html. bootc is not currently in the production streams, so you'll have to e.g. RUN rpm-ostree install -y bootc.

I am not sure if I understand your reply... I want to be able to run some custom tests within the qcow2 image that came out of the cosa build... Basically just running some commands like k3s version to check if the binary is executable and giving the corret version

How is bootc or rpm-ostree related to that?

dustymabe commented 2 months ago

How is bootc or rpm-ostree related to that?

I think what he is saying is that instead of having to build your own pipeline you can just do a container build and then pivot existing vanilla Fedora CoreOS to that container:

sudo rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/organization/customfcos:latest

See https://github.com/coreos/layering-examples/ for some examples.

sergeiwaigant commented 2 months ago

Thank you for the hint, but still I dont think thats suitable for us. I understand that this wouldnt really use the custom build qcow2 image for the tests?

The goal is to build a customized FCOS qcow2 image for the deployment in Proxmox. After the build (in GitLab CI) we need to "test" the image before pushing it into the private registry. The server that we create out of that image shall run k3s with enabled selinux, bootstrapped by ignition files.

sergeiwaigant commented 2 months ago

Guys... thank you for the support, but now its working. Looks like it was just not working on my test machine, but in GitLab CI its running fine now... the test is passing.

Closing the issue!