flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
688 stars 30 forks source link

[RFC] arm64 SDK #319

Open vbatts opened 3 years ago

vbatts commented 3 years ago

Current situation the flatcar sdk is for an amd64 host. All arm64 image and binaries are cross-compiled.

Impact now that powerful arm64 machines are readily available, it's currently not an option to use them for builds

Ideal future situation A flatcar arm64 image can be built from an arm64 host.

TODO

pothos commented 3 years ago

A first step would be to align the USE flags and related things of the package setup to be equal on or independent of both architectures. The SDK bootstrap needs a seed tar ball with a compiler – I guess the arm64 developer container can be used for that.

jepio commented 3 years ago

So I managed to get it working with the arm64 developer container as a base, however I would not recommend trying that a second time: the developer container has the wrong profile and CHOST, making the process trickier than necessary to begin with. After that it's all about keywords and fixing x86 specific assumptions.

The built SDK can be fetched from ~https://jepio.blob.core.windows.net/flatcar-arm64/stage4-arm64-2920.0.0+2021-07-13-1007.tar.bz2~ https://jepio.azureedge.net/flatcar-arm64/2942.0.0/flatcar-sdk-arm64-2942.0.0+2021-07-27-0724.tar.bz2. The binary packages, intermediate stages and my hacked up developer container can be found in the same bucket (https://jepio.blob.core.windows.net/flatcar-arm64?restype=container&comp=list).

I believe most of the changes necessary to get the bootstrap working (starting from a reasonable seed) can be merged, I'll submit the PRs in the next weeks.

vbatts commented 3 years ago

@jepio that's awesome! that was the path I began, but abandoned when I was trying from my pinebook. Looking forward to it being an option

jepio commented 3 years ago

Here's a bit more instructions for how to use this @dongsupark:

git clone https://github.com/kinvolk/mantle
pushd mantle
./build cork
# move to somewhere on your PATH
popd

# pull my key
gpg --keyserver hkps://keys.openpgp.org --recv-keys 3717D1B5C719A9BD

mkdir -p flatcar-sdk/.cache/sdks/
pushd flatcar-sdk
latest=$(curl https://jepio.azureedge.net/flatcar-arm64/latest.txt | awk '{ print $1 }')
wget -O .cache/sdks/flatcar-sdk-arm64-2942.0.0.tar.bz2 "https://jepio.azureedge.net/flatcar-arm64/${latest}"
wget -O .cache/sdks/flatcar-sdk-arm64-2942.0.0.tar.bz2.sig "https://jepio.azureedge.net/flatcar-arm64/${latest}.sig"
gpg --armor --export jpiotrowski@microsoft.com >key.gpg
cork create --sdk-version 2942.0.0 --verify-key key.gpg
cork enter
git -C ~/trunk/src/scripts/ checkout jepio/arm64-sdk-support
git -C ~/trunk/src/third_party/coreos-overlay/ checkout jepio/arm64-sdk-support
./update_chroot
./boostrap_sdk --seed_tarball /mnt/host/source/.cache/sdks/flatcar-sdk-arm64-2942.0.0.tar.bz2
pothos commented 3 years ago

Currently the SDK sets up QEMU process emulation for arm64 and now you would need to set up the reverse direction because during compilation of some packages the build system has to run some amd64 binaries.

The key action is setting up QEMU_LD_PREFIX and the rest can actually be removed because the normal Debian/Fedora packing of qemu-user loads the qemu binary into RAM on boot and we don't need to set it up from the SDK with dynamic loading which does not cover all cases.

jepio commented 3 years ago

I agree.

It doesn't look like it's going to be as "easy" as crosscompiling arm64 from amd64 though, because some programs that are called from the SDK for image building (syslinux, x86-specific parts of grub) can't be built for arm64 natively.

My first objective is to get arm64 -> arm64 working, and get us into a state where we can support the SDK infrastructure on our servers. Without that, it's a pain to use.

dongsupark commented 3 years ago

In general it looks good. Following @jepio's instruction, I was able to create an arm64 SDK on an arm64 host. (actually an arm64 VM on Mac M1) From that tarball, I created another flatcar-sdk environment. Inside that, I was able to successfully build an arm64 qemu image, with fantastic speed.

However some hacks or tweaks are needed. I am just listing all, probably some of them are just my testing failure.

pothos commented 3 years ago

For the last point: the script already checks for an arm64 host but it assumes that KVM is available, maybe you can enable nested virtualization for the host VM?

jepio commented 3 years ago

Nested virt is not available on M1 right now.

@dongsupark are you using qemu for virtualization? You could try to adapt the script to get it working when ran from the host. Probably some of the helper tools are not available, and instead of accel=kvm you need accel=hvf.

dongsupark commented 3 years ago

are you using qemu for virtualization? You could try to adapt the script to get it working when ran from the host. Probably some of the helper tools are not available, and instead of accel=kvm you need accel=hvf.

I am using UTM, basically a wrapper around qemu. UTM is already passing accel=hvf as expected. Still inside the guest Linux VM, the Flatcar script fails. Anyway don't worry, that's not super critical, I could find out other options for testing qemu images. ;-)

dongsupark commented 3 years ago

As for the multi-arch issue in generate_au_zip, I ended up writing a PR like that: https://github.com/kinvolk/flatcar-scripts/pull/141

jepio commented 2 years ago

I think this is what was needed to create a "seed" from a development container https://gist.github.com/jepio/7ee539b768f7a33953d137d0ff7c6abe.

chewi commented 4 months ago

I've had a fresh go at achieving this.

Flatcar has been using Catalyst 3, but Gentoo have masked this now in favour of 4.0-rc1. One benefit of the new version is that it can leverage qemu-user to build for other architectures. Updating Flatcar's scripts for Catalyst 4 was quite tricky, as a lot has changed, but we were going to have to do it sooner or later.

I then kicked off a build using a vanilla Gentoo arm64 stage3 as a seed. It turned out that the seed needed git installed because of cros-workon.eclass, so I added that, although a fallback for when git is not installed yet might be a good idea.

I made it past stage1 before hitting a bug in Catalyst. I've now fixed this and am facing some USE conflicts involving curl, openssl, and rust. I'm not sure why, as it even happens for a native amd64 build using a Flatcar seed, but seemingly only with Catalyst 4. I'm still investigating.

@ader1990 also expressed interest in a riscv build, which I would like to see. It should be possible to use the same approach once a riscv Portage profile has been created, although a lot of the scripts have code like "if amd64, do thing natively, else if arm64, do thing via QEMU". These will need improving for the SDK to actually be usable.

chewi commented 4 months ago

Figured it out. It's not a new problem. I'd been doing bootstrap_sdk stage2 because I'd already built stage1, but that causes it start from the latest SDK, not the stage1 you built earlier. Simply doing bootstrap_sdk would have skipped rebuilding stage1 anyway unless I'd added --rebuild.

chewi commented 4 months ago

Damn, I've been caught out by the vanilla Gentoo seed and our package tree not quite aligning. Specifically, Perl cannot find libcrypt.so.2, part of libxcrypt. Gentoo migrated to libxcrypt 2½ years ago, so we're quite behind there.

dongsupark commented 4 months ago

@chewi Looks like it is a good chance to resurrect the open PR https://github.com/flatcar/scripts/pull/1732.

ader1990 commented 3 months ago

I think this is what was needed to create a "seed" from a development container https://gist.github.com/jepio/7ee539b768f7a33953d137d0ff7c6abe.

I tried to run the workflow accordingly and I got this error when running this command:

./run_sdk_container -x "ci-cleanup.sh" -C flatcar-sdk-import:${VERSION} sudo -E ./bootstrap_sdk --seed_tarball flatcar-sdk-arm64-${VERSION}.tar.bz2

>>> Failed to emerge app-misc/ca-certificates-3.82 for /tmp/stage1root/, Log file:
>>>  '/mnt/host/source/src/build/catalyst/log/app-misc:ca-certificates-3.82:20240528-112105.log'
>>> Installing (122 of 127) virtual/tmpfiles-0-r1::portage-stable to /tmp/stage1root/
--2024-05-28 11:21:06--  http://mirror.release.flatcar-linux.net/portage-stable/distfiles/layout.conf
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:06 ERROR 404: Not Found.

!!! Couldn't download '.layout.conf.mirror.release.flatcar-linux.net'. Aborting.
--2024-05-28 11:21:06--  http://mirror.release.flatcar-linux.net/portage-stable/distfiles/nss-3.82.tar.gz
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:07 ERROR 404: Not Found.

--2024-05-28 11:21:07--  http://mirror.release.flatcar-linux.net/coreos/distfiles/layout.conf
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:07 ERROR 404: Not Found.

!!! Couldn't download '.layout.conf.mirror.release.flatcar-linux.net'. Aborting.
--2024-05-28 11:21:07--  http://mirror.release.flatcar-linux.net/coreos/distfiles/nss-3.82.tar.gz
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:08 ERROR 404: Not Found.

--2024-05-28 11:21:08--  http://distfiles.gentoo.org/distfiles/d7/nss-3.82.tar.gz
Resolving distfiles.gentoo.org... 195.181.175.40, 212.102.56.179, 156.146.33.138, ...
Connecting to distfiles.gentoo.org|195.181.175.40|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:08 ERROR 404: Not Found.

--2024-05-28 11:21:08--  ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_82_RTM/src/nss-3.82.tar.gz
           => '/var/gentoo/distfiles/nss-3.82.tar.gz.__download__'
Resolving ftp.mozilla.org... 34.117.35.28
Connecting to ftp.mozilla.org|34.117.35.28|:21... failed: Connection timed out.
Retrying.

--2024-05-28 11:22:10--  ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_82_RTM/src/nss-3.82.tar.gz
  (try: 2) => '/var/gentoo/distfiles/nss-3.82.tar.gz.__download__'

I tried to update the branch to use nss source uri without the ftp://, but did not get a successful build, as the branch code is not used it seems, but the actual image is used which had to be already built: https://alpha.release.flatcar-linux.net/arm64-usr/3510.0.0/flatcar_developer_container.bin.bz2.

jepio commented 3 months ago

Old sources likely gone from mirrors, let me see if this is something that can be revived.

chewi commented 3 months ago

I made it past stage3, but I figured building Rust and such was going to take forever under QEMU. I had another idea to run this on arm64 hardware, by emulating the SDK (i.e. Catalyst) and doing the actual building natively. It's already racing ahead of where the other build had got to. Shouldn't be long now.

jepio commented 3 months ago

@chewi: if we're switching to catalyst4 then it would be a good idea to switch for amd64 sdk first, validate that everything is still correct and then go for arm64. i can also get you access to a shiny Azure Cobalt instance for building

jepio commented 3 months ago

I've pushed two sdk container images: ghcr.io/jepio/flatcar-sdk-arm64/flatcar-sdk-arm64:3941.0.0-2024-05-29-1223 ghcr.io/jepio/flatcar-sdk-arm64/flatcar-sdk-tarball:3941.0.0-2024-05-29-1223 <- catalyst output

I hit an issue cross-building a native toolchain for amd64: cet support in the toolchain (enabled by amd64 hardened profile) has a build dependency on binutils[cet]. cet is only unmasked for amd64 profiles. I would ignore cross-compiling from amd64->arm64, there is no usecase for it and it's not something anyone has thought of doing in the oss world.

chewi commented 3 months ago

Yeah, switching to 4 first might be best before any official arm64 SDK release, but the result of this should be good enough to kick off an entirely native build once we've done that.

I believe CET refers to some possibly amd64-specific CPU feature. From a Gentoo perspective, I'd like cross-compiling with CET to work, so I'll look into avoiding the mask when using crossdev.

chewi commented 3 months ago

Bah, it failed quite late on with sys-block/thin-provisioning-tools. Seems like some broken CoreOS Rust/cargo cross-compiling logic.

chewi commented 2 months ago

I got it working. Closing in favour of flatcar/scripts#2093.

jepio commented 2 months ago

This is the tracking issue, we keep it open until the PR lands.