coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
330 stars 165 forks source link

osbuild: conditionally use the built payload as the buildroot #3808

Closed dustymabe closed 1 month ago

dustymabe commented 1 month ago

With this we now use a buildroot that is derived from the OCI container that was built by the pipeline. This allows us to use the exact same versions of software from the payload we built when we construct the images that we will ship, which will be better for us over time.

The benefits of this are immediately apparent in this commit as we are able to drop configuration that tries to set feature flags for our ext4 filesystems based on what we think are the current defaults in RHEL.

For now we aren't able to do this with FCOS because FCOS doesn't have python in it. This should be OK for now because COSA is almost always based on the latest version of Fedora. Though one benefit we would have if we did switch to doing this for FCOS is that we would test newer versions of "build tools" from rawhide alongside the rawhide pipeline builds that we do.

dustymabe commented 1 month ago

Implementing this we think will mean issues like https://github.com/openshift/os/issues/1504 will go away.

dustymabe commented 1 month ago

This will fix https://github.com/coreos/coreos-assembler/issues/3801

dustymabe commented 1 month ago

so coreos.unique.boot.ignition.failure is failing here in ci/prow/rhcos. I can also reproduce this locally.

It looks like the Ensure Unique 'boot' Filesystem Label in the console is happening before Ignition even runs, but the coreos.unique.boot.ignition.failure adds a boot labeled filesystem using Ignition.

Is it really required for this unit to run after Ignition is complete? I guess so since that's a test case we want to cover, but we'll probably have to strengthen the unit dependencies.

dustymabe commented 1 month ago

ahh. interestingly enough we have two units that check if boot is unique, but they both have very similar descriptions so the log messages are hard to distinguish to the untrained eye.

so it looks like maybe the coreos-ignition-unique-boot.service somehow isn't doing it's job here.

dustymabe commented 1 month ago

ok I think I found the real error earlier up in the log:

[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mGenerate New UUID For Boot Disk GPT^[[0m.^M                                                                                                                        
[    4.126289] systemd[1]: Finished Generate New UUID For Boot Disk GPT.^M                                                                                                                                  
[    4.137089] ignition-ostree-transposefs[908]: Moving bootfs to RAM...^M                                                                                                                                  
         Starting ^[[0;1;39mIgnition OSTree: Save Partitions^[[0m...[    4.137822] systemd[1]: Starting Ignition OSTree: Save Partitions...^M                                                               
^M                                                                                                                                                                                                          
[    4.140833] ignition-ostree-transposefs[908]: Mounting /dev/disk/by-label/boot ro (/dev/vdb3) to /var/tmp/mnt^M                                                                                          
         Starting ^[[0;1;39mIgnition OSTree: …rate Filesystem UUID (boot)^[[0m...[    4.144472] systemd[1]: Starting Ignition OSTree: Regenerate Filesystem UUID (boot)...^M                                
^M                                                                                                                                                                                                          
[    4.155819] ignition-ostree-firstboot-uuid[918]: e2fsck 1.46.5 (30-Dec-2021)^M                                                                                                                           
[    4.155947] ignition-ostree-firstboot-uuid[918]: /dev/disk/by-label/boot is in use.^M                                                                                                                    
[    4.155973] ignition-ostree-firstboot-uuid[918]: e2fsck: Cannot continue, aborting.^M                                                                                                                    
[    4.158304] EXT4-fs (vdb3): mounted filesystem 96d15588-3596-4b3c-adca-a2ff7279ea63 ro with ordered data mode. Quota mode: none.^M                                                                       
[^[[0;1;31mFAILED^[[0m] Failed to start ^[[0;1;39mIgnition O…nerate Filesystem UUID (boot)^[[0m.^M
jlebon commented 1 month ago

I think the CI issue is roughly:

But also, this would make us enter a completely different path in ignition-ostree-firstboot-uuid.

I think it'd be safer to still enable the feature? If we just hard enable it and then we can simplify ignition-ostree-firstboot-uuid.

dustymabe commented 1 month ago

CI is fixed and all comments should be addressed now.