coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
335 stars 165 forks source link

osbuild: fix virtiofs data loss; switch compression to off #3729

Closed dustymabe closed 6 months ago

dustymabe commented 6 months ago

Also picked up a commit from walters in https://github.com/coreos/coreos-assembler/pull/3712 to allow running a VM from raw image.

See individual commit messages.

dustymabe commented 6 months ago

OK I gave a run of this through our staging pipeline. x86_64 progressed without issue (this is where we were getting stuck before), but aarch64 appears to have had the same problem where GRUB was complaining about not being able to find the kernel.

I logged into the builder (the exact container where the build was happening) and was able to run the qemu qcow2 and see the same results (failed GRUB). I recreated the qcow2 (cosa buildextend-qemu --force), same result. I blew away the cache2.qcow2 and ran it again (cosa buildextend-qemu --force) and it worked this time.

Something definitely is going on here that is a bit odd.

dustymabe commented 6 months ago

Either way this is an improvement. Will merge this and continue investigation as issues come up.

jlebon commented 6 months ago

If we want to be extra thorough, we can also do a freeze/thaw cycle before fully unmounting the cache. Some filesystems may not fully flush their journal before unmounting. See also https://github.com/coreos/coreos-assembler/pull/1482 and https://github.com/coreos/coreos-assembler/pull/3040.

dustymabe commented 6 months ago

If we want to be extra thorough, we can also do a freeze/thaw cycle before fully unmounting the cache. Some filesystems may not fully flush their journal before unmounting. See also #1482 and #3040.

I think the problem is actually more that the file written out over virtiofs isn't fully copied out, but I could be wrong. Is there a virtiofs_freeze :) ?

Here is what is happening inside the supermin VM:

OSBuild builds, stores things in the cache2.qcow2 XFS filesystem. At the end it copies out the qemu.qcow2 into the ext2 supermin VM root filesystem (I didn't want to store this in the cache because we don't need to save it in the cache for anything later to use it). Then we copy (over virtiofs) that file into the correct place it's supposed to be where our COSA processes can continue processing it.

dustymabe commented 6 months ago

ok I'm still investigating this. I'm going to close this PR while I iterate on this code and test things on various architectures.