Closed ghost closed 1 year ago
I mentioned this in matrix, but I'll say it again here: I'm surprised it's been working this whole time (I certainly never test on a machine that small).
Did you happen to find which version in the testing-devel
stream was the first to stop working?
FWIW I just tried booting a 512M qemu qcow image (fedora-coreos-38.20230722.3.0-qemu.x86_64.qcow2
) and it booted fine:
[core@cosa-devsh ~]$ rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
DriverState: active; periodically polling for updates (last checked Thu 2023-08-10 21:16:25 UTC)
Deployments:
â—Ź fedora:fedora/x86_64/coreos/stable
Version: 38.20230722.3.0 (2023-08-07T18:56:37Z)
Commit: bf28f852e934b0c0b9eee232a58970e96adb3e691299b02376f8719530e03fb3
GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
[core@cosa-devsh ~]$
[core@cosa-devsh ~]$ free -m
total used free shared buff/cache available
Mem: 441 152 114 2 173 275
Swap: 0 0 0
Did you happen to find which version in the
testing-devel
stream was the first to stop working?
Yeah, it’s 38.20230722.3.0 x86_64
exactly. Previous version works, I did the back and forth. EDIT: OK, I'll check the exact testing-devel
version.
FWIW I just tried booting a 512M qemu qcow image (
fedora-coreos-38.20230722.3.0-qemu.x86_64.qcow2
) and it booted fine:
Interesting. Maybe assuming it’s a RAM issue was the wrong idea?
For sure: I can consistently make the boot fail with a t3a.nano
, while consistently make it work with t3a.micro
(notable only difference given by AWS is the RAM amount). I'll try to make more tests.
I mentioned this in matrix, but I'll say it again here: I'm surprised it's been working this whole time (I certainly never test on a machine that small).
Yes! I just want to spark a broader discussion. Because it’s a usage we have and that does not work anymore as well as the other "things to consider" I mentioned.
Did you happen to find which version in the
testing-devel
stream was the first to stop working?Yeah, it’s
38.20230722.3.0 x86_64
exactly. Previous version works, I did the back and forth.
The testing-devel
stream is our development stream where we have many builds a week. The artifacts (or in this case the reference to the AMI ID) can be picked up from the unofficial builds browser. The testing-devel
build numbers look like XX.YYYYYYYY.20.Z
. When you say 38.20230722.3.0
it appears you tested a stable
stream build and not a testing-devel
stream build.
Here are my findings:
fedora-coreos-38.20230712.20.0-x86_64 (ami-04b897868a2b0c657) : OK
fedora-coreos-38.20230712.20.1-x86_64 (ami-0e4d81169b2565fcc) : FAIL
Just to confirm it continued to fail for another testing version along the way:
fedora-coreos-38.20230714.20.0-x86_64 (ami-01424d4ce6c912ceb) : FAIL
The difference there was:
ignition 2.15.0-3.fc38.x86_64 → 2.16.2-1.fc38.x86_64
So maybe the size increased a lot for Ignition. You can investigate further by grabbing the RPMs from koji.
~What's the disk size for that instance? Could it be https://github.com/coreos/fedora-coreos-tracker/issues/1535 ?~
Apparently that's another issue.
We discussed this topic in the community meeting today.
It was pointed out that Fedora does have some documentation on minimum system requirements here. That guidance currently recommends 2G+.
While it would be nice if 512M would continue to work I don't think it's worth us spending time on it. You could use Fedora Cloud image but I don't even think that would work as dnf
chews through a decent amount of memory when downloading repo metadata.
Even with all of that said, if someone were to find the root cause of the change in behavior and propose a patch it would be considered.
Too bad I couldn’t join the meeting this morning. :(
It was pointed out that Fedora does have some documentation on minimum system requirements here. That guidance currently recommends 2G+.
I’d like to point out this is misguided. Recommending 2G+ on a user system is a low bar nowadays. This documentation even emphasize that GUI-desktop and services tend to consume a lot.
On this other side, requiring 2GB+ on any cloud environment is unreasonable. With the research of very high availability, engineers in the field tend to scale horizontally rather than vertically. Meaning they actually seek low-spec machines, but prefer having multiple of them. I mean it: a big portion of the work is actually making sure that service can run on the smallest spec possible: very small, very low footprint containers.
If Fedora CoreOS choose to follow 2GB+ minimal RAM, I believe it becomes consequently a bad choice for cloud computing. Imagine if the smallest machine in any horizontal scaling system have to be 2GB minimum… that would be a waste of energy and money.
I'm aware container orchestrators adds another layer to circumvent that issue, but still: orchestrators themselves needs services outside of them to work properly: key-value stores, secret vaults, VPN, service mesh…
This is especially a big problem for me because of the lack of guarantee. While I understand fcos might run fine with 1GB machine (or lower if original "problem" of this ticket is ever "fixed"), deciding on a 2GB minimal spec means I would receive no help and support if ever I get a problem related to fcos RAM consumption on a machine under 2GB (just to clarify this already: I do not expect anyone to solve the problem: but there is a difference between recognizing there is even a problem VS "no problem here").
Even with all of that said, if someone were to find the root cause of the change in behavior and propose a patch it would be considered.
Sadly I cannot offer much in terms of debugging besides testing on AWS.
Too bad I couldn’t join the meeting this morning. :(
Come join us same time next week.
Like Dusty did, I tried running a QEMU image with 512MB memory and it booted:
$ cosa run --qemu-image fedora-coreos-38.20230806.1.0-qemu.x86_64.qcow2 --memory 512
[core@cosa-devsh ~]$ free -m
total used free shared buff/cache available
Mem: 442 174 90 2 177 254
Swap: 0 0 0
[core@cosa-devsh ~]$ rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
DriverState: active; periodically polling for updates (last checked Thu 2023-08-17 10:31:03 UTC)
Deployments:
â—Ź fedora:fedora/x86_64/coreos/next
Version: 38.20230806.1.0 (2023-08-07T18:56:40Z)
Commit: ec10f2df99e1bfd4621022f5d11950cea5395c867ce3e9a4eb2e1f5aee4cf0e5
GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
Anything under 500MB of memory failed to boot for me, likely due to the initrd not having enough space in RAM to be extracted to, leading to files missing from the initramfs and the boot process failing. If kernels running on AWS / Xen instances reserve just a slightly more memory for themselves or during boot, then we end up in AWS systems not booting with 512MB RAM.
I suspect that with the size of the initrd growing, low memory systems will be less and less supported as time goes on. Related discussions in https://github.com/coreos/fedora-coreos-tracker/issues/1465 & https://github.com/coreos/fedora-coreos-tracker/issues/1247.
Fixing this would require a significant amount of effort, but is not out of reach.
So while we very much want to support as many configurations and platforms as possible, we have to be honest upfront to our users that systems below a minimum bar might encounter issues at some point. Everyone is free to ignore those recommendations.
There is obviously no "good value" for this as everyone has a different use case. The "best we can do" values are the ones we run our tests with, because we would have a fairly good confidence that this configuration will work. If I'm not mistaken, the current default in 1GB.
Hey travier, thanks for the answer!
I'm worried regarding my second point. See, on my perspective this created some downtime and significant manpower on our end. It used to work, it does not anymore. Thus the discussion about is the minimal amount of RAM supported; but, even if I wish you could tell me 512M is the minimum "officially" supported but I understand the effort it requires is high, so it’s more of a "if it works, it works. if it does not, it does not"-stance.
So now I'm left wondering: what if I have an issue with 1GB RAM machines in the next months, is it gonna be considered a bug or not? (Maybe now because it’s your test machine size, but that’s subject to changes.) Because the answer directly impact my ability to offer stability in the system I create and maintain as well as providing the correct tool for our end goals.
Of course, I do not expect you guys to jump and solve bugs and issues unrelated to Fedora/CoreOS itself, but what if it happens - let’s say afterburn unit leaks a lot of memory but it’s hard to figure why - what will be your stance if the machine has 512M, 1GB, 2GB?
For example, microOS is clear: they support 1GB with some caveats. I didn’t test that, but if I have an issue with a microOS not booting on 1GB machine, I'm gonna assume they will fix it.
The difference here is that I can say to money holders that: "we use a system that officially runs on this type of machines, with these specs." (and therefore: if it does not anymore, everyone would expect the problem to be fixed)
So, the discussion is: can coreOS take an official stance that 1GB is the minimum supported from the time being?
MicroOS docs say:
I think I read that to say: we need 1G for microOS and you add whatever memory you need (in addition to 1G) for your application. I think we fit fine in those restraints typically.
The problem that you are running into right now is that the initramfs won't unpack into 512M on that instance type. However, once the system is booted (gets past the initramfs) it runs fine with no apps in less than 512M of memory. If you don't layer any packages then 512M of memory would probably continue to update fine.
I think what I'm trying to say is:
So, the discussion is: can coreOS take an official stance that 1GB is the minimum supported from the time being?
I don't think we are going to make an official stance on this beyond the docs that were already linked. As @travier mentioned we already run most of our tests in VMs with 1G
of memory for x86_64
at least (see code). So we'll know if we start to breach that threshold.
Is the initramfs
file always copied verbatim by grub
whatever happens before launching the kernel? If so, wouldn't using compress=cat
in dracut
basically solve the problem? The current compressed initramfs
takes ~90MB and the uncompressed image ~160MB that would lower the failing point to ~410MB.
I have also had a look at what takes space: /usr/bin/ignition
takes a whopping 30MB all by itself and /usr/bin/afterburn
7MB then followed by /sbin/NetworkManager
and /usr/lib64/systemd/*
libs.
Ignition looks bad as I am struggling to believe that 30MB is not something that can be reduced for a program that does little technically (i.e. reading JSON and spawning external programs to do the "hard work"). Same thing, to a lesser extent, for afterburn.
Network Manager looks pretty bad too: it's 10MB of binaries redundant with included systemd libraries: adding the systemd-networkd
binary and removing nm would net a 8.3MB reduction in size.
Another way of looking at the problem is at installation time.
coreos-installer
could provide a switch to setup an install-only swap space (either in-file or on partition) for instance (if the kernel allows at least the compressed initramfs
to be swapped out). If swapping out works as intended, the fix for AWS would just be adding a 512MB swap file/partition by default in the cloud images.
And/or it could provide a flag to uncompress initramfs
on-the-fly while putting the binary in the EFI partition and drop a dracut config to set compress=cat
.
This is especially a big problem for me because of the lack of guarantee.
So now I'm left wondering: what if I have an issue with 1GB RAM machines in the next months, is it gonna be considered a bug or not? (Maybe now because it’s your test machine size, but that’s subject to changes.) Because the answer directly impact my ability to offer stability in the system I create and maintain as well as providing the correct tool for our end goals.
The difference here is that I can say to money holders that: "we use a system that officially runs on this type of machines, with these specs." (and therefore: if it does not anymore, everyone would expect the problem to be fixed)
So, the discussion is: can coreOS take an official stance that 1GB is the minimum supported from the time being?
Fedora CoreOS is an open source project. It does not come with any guarantee for support. We try to fix as many issues as we can but there are no guarantee that any specific issue will be fixed. We're not special here, every open source project is like that, it's written in the license.
I'm not saying that this will never be fixed or that we won't accept a PR to fix it. As I wrote in https://github.com/coreos/fedora-coreos-tracker/issues/1540#issuecomment-1682064290, fixing this is not easy (otherwise we would likely be doing it).
Instead, we're suggesting workarounds. One of those (lost to chat) is:
As Ignition is the largest binary, we could consider stripping it and removing debug info as we don't really expect user to debug Ignition in the initramfs: https://gophercoding.com/reduce-go-binary-size/
As Ignition is the largest binary, we could consider stripping it and removing debug info as we don't really expect user to debug Ignition in the initramfs: https://gophercoding.com/reduce-go-binary-size/
IIUC our binary as delivered by the RPM is already stripped and without debug_info:
$ file /usr/lib/dracut/modules.d/30ignition/ignition
/usr/lib/dracut/modules.d/30ignition/ignition: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=c9346043c36089a4161d63c33b96b4d80ee642eb, for GNU/Linux 3.2.0, stripped
is already stripped and without debug_info
Indeed, I have just checked.
The swap "trick" does not work (who would have thunk initramfs
wasn't swappable?). Neither does uncompressing initramfs as the kernel creates a tmpfs and copies the content of the initcpio anyway (so it makes things worse).
Would you consider using upx
on ignition
(and a few other binaries) before inclusion to initramfs
? This reduces the executable size from 28MB (on main) to 7.5MB.
More generally, compressing ignition, afterburn, nmcli, bash and NetworkManager that way reduces the uncompressed size from 156MB to 131MB, keeping the same features!
Is this where compression is set for dracut
? I am asking because this is not consistent with the output of binwalk
that does not see any zstd
in there.
For what it's worth, recompressing with xz -e9
reduces (original) initramfs from 85.5MB to 69.0MB (not strictly equivalent because tar instead of cpio for recompression). zstd -19
still gives 72.7MB.
With the few individually compressed binaries, both zstd -19
and xz -e9
initramfs get 2-3MB bigger (74.8MB and 72.4MB).
That's a compound save of 35MB at worse.
Tangentially, this is a strong case against go on constrained systems (I would argue initramfs
is). Just for reference, a statically-linked python
is 30MB, stripped of its debug info, it is 5.5MB. Once upx
ed, it is 1.9MB and it is usable... to read json... and launch subprocesses...
There is no clear path to binary reduction in go. There is no -Os
or equivalent option and apparently no interest upstream to control a bit more binary bloat. A few more tools like ignition and the minimum requirement will become much higher than any fancy desktop operating system, which then will be a problem.
NOTE: I wrote this response last Friday, but realize just today I never clicked to make the comment (it was in an open tab). I'm submitting it now, but some of the info may be outdated or the conversation could have moved on.
Is the
initramfs
file always copied verbatim bygrub
whatever happens before launching the kernel? If so, wouldn't usingcompress=cat
indracut
basically solve the problem? The current compressedinitramfs
takes ~90MB and the uncompressed image ~160MB that would lower the failing point to ~410MB.
Yes. Using compress=cat
would solve the problem (the problem being the decompression of the compressed initramfs running out of memory). But it can/will lead to other problems because our /boot/
filesystem isn't large. See https://github.com/coreos/fedora-coreos-tracker/issues/1247 and https://github.com/coreos/fedora-coreos-tracker/issues/1465
If you were to make the compress=cat change locally I imagine you'd hit some trouble eventually. Though you could experiement with using one of the other compression alorithms, which may less memory intensive during decompression.
I have also had a look at what takes space:
/usr/bin/ignition
takes a whopping 30MB all by itself and/usr/bin/afterburn
7MB then followed by/sbin/NetworkManager
and/usr/lib64/systemd/*
libs.Ignition looks bad as I am struggling to believe that 30MB is not something that can be reduced for a program that does little technically (i.e. reading JSON and spawning external programs to do the "hard work"). Same thing, to a lesser extent, for afterburn.
This is part of the downsides of the Go and Rust programming languages. I would love to make those binaries smaller, but don't have any ideas other than a rewrite of the software, which would represent significant investment.
Network Manager looks pretty bad too: it's 10MB of binaries redundant with included systemd libraries: adding the
systemd-networkd
binary and removing nm would net a 8.3MB reduction in size.
We chose NM for the networking stack a long time ago. The media that we ship will continue to do so unless something significant changes.
Another way of looking at the problem is at installation time.
coreos-installer
could provide a switch to setup an install-only swap space (either in-file or on partition) for instance (if the kernel allows at least the compressedinitramfs
to be swapped out). If swapping out works as intended, the fix for AWS would just be adding a 512MB swap file/partition by default in the cloud images.
Honestly this stuff is happening so early in boot I doubt a swap file would matter at all.
And/or it could provide a flag to uncompress
initramfs
on-the-fly while putting the binary in the EFI partition and drop a dracut config to setcompress=cat
.
Would you consider using
upx
onignition
(and a few other binaries) before inclusion toinitramfs
? This reduces the executable size from 28MB (on main) to 7.5MB.
Interesting. TIL about upx. Honestly I'm not really sure of the drawbacks but I feel like the reward/risk ratio might be pretty low here.
Has anyone else following this thread used it?
More generally, compressing ignition, afterburn, nmcli, bash and NetworkManager that way reduces the uncompressed size from 156MB to 131MB, keeping the same features!
Is this where compression is set for
dracut
? I am asking because this is not consistent with the output ofbinwalk
that does not see anyzstd
in there.
Yes that should be the place it's controlled. See https://github.com/coreos/fedora-coreos-config/pull/1844 and https://github.com/coreos/fedora-coreos-tracker/issues/1247#issuecomment-1183925321. It reduced the size and reduced the amount of time to decompress.
For what it's worth, recompressing with
xz -e9
reduces (original) initramfs from 85.5MB to 69.0MB (not strictly equivalent because tar instead of cpio for recompression).zstd -19
still gives 72.7MB.With the few individually compressed binaries, both
zstd -19
andxz -e9
initramfs get 2-3MB bigger (74.8MB and 72.4MB).That's a compound save of 35MB at worse.
I'm not sure exactly what you're advocating for here. The problem we are running into is running out of memory when decompressing and extracting the initramfs. So what we need to do make sure that the decompression and extraction (both happening in memory) don't step over 512M. It's more a combination of things and not just compressed initramfs size that dictate whether we fail here.
For example, maybe the xz
option makes the compressed initrd smaller, but xz
is memory intensize on the decompress so it doesn't matter and we still run out of memory.
Yes. Using compress=cat would solve the problem
Apparently though the kernel will make a copy whatever happens so the possibly "cat-compressed" archive will be put in memory first by the bootloader and the kernel will copy them over to the tmpfs
-based rootfs.
but don't have any ideas other than a rewrite of the software, which would represent significant investment.
That's the spirit of my last ("tangent") comment: I know this won't be rewritten and I know there is no trivial nor not-so-trivial way to reduce go binary size. I've had a look: we are in the same boat.
All I say is at time when a new feature is discussed for implementation in Fedora, if that thing must make its way to the initramfs, I would greatly appreciate it if the issue of binary size was raised with implementers so that they consider language thoroughly. fcos
runs under 200MB once fully booted, add a few more ignition-like binaries in initramfs and before you know it you'll demand 2GB+, then 4GB+.
Yes that should be the place it's controlled.
Thanks!
maybe the xz option makes the compressed initrd smaller, but xz is memory intensize on the decompress so it doesn't matter and we still run out of memory.
Excellent point: I was just thinking in terms of what is in memory at any given time, which would be {bootloader+compressed kernel/initramfs}, then {kernel + compressed initramfs + tmpfs with uncompressed initramfs} then {kernel + tmpfs}. At the moment stage stage 2 seems to be blocking (hence my little calculation) and I had assumed decompression occurred on (very) small chunks of memory but that was a baseless assumption on my part!
I need to run some tests. I would like to see how the kernel handle multi-layered initramfs (i.e. with multiple cpio archives, like we have currently for the microcodes), especially with respect to memory allocation.
Alternatively there could be a way to extract all the first boot bits into their own image that is brought up by a thinner initramfs on first boot and potentially removed once first boot succeeded. E.g. ignition and afterburn could literally be dropped into ESP by coreos-installer
and deleted if successful?
This is part of the downsides of the Go and Rust programming languages. I would love to make those binaries smaller, but don't have any ideas other than a rewrite of the software, which would represent significant investment.
I wouldn't conflate Go and Rust in this respect. It very much depends on things, and rewriting (in what?) isn't necessarily going to make things smaller!
One concrete drawback of Go specifically is called out here https://github.com/u-root/u-root/issues/1477#issue-533334548 - and Ignition is a heavy user of reflect
.
Yes. Using compress=cat would solve the problem
Apparently though the kernel will make a copy whatever happens so the possibly "cat-compressed" archive will be put in memory first by the bootloader and the kernel will copy them over to the
tmpfs
-based rootfs.
I did do some tests with compress=cat
last Friday (I was stuck at a car dealership and was bored) and it did seem to help for me. Though, as mentioned in https://github.com/coreos/fedora-coreos-tracker/issues/1540#issuecomment-1687298513 this approach can/will lead to other problems because our /boot/ filesystem isn't large. See https://github.com/coreos/fedora-coreos-tracker/issues/1247 and https://github.com/coreos/fedora-coreos-tracker/issues/1465
Describe the bug
Hi đź‘‹
Since last stable version, coreOS does not boot anymore on AWS
nano
instance type. These machines have512M
of RAM.Reproduction steps
t3a.nano
.Expected behavior
Either fix the problem by lowering the footprint of the first fcos initialization or direct me to ways to not shadow things in RAM during initialization or be clear about expected specs for coreOS?
Things to consider:
Actual behavior
Relevant errors in log:
Bigger-spec machines boots with same configuration.
System details
AWS t3a.nano Fedora CoreOS stable 38.20230722.3.0 x86_64
Butane or Ignition config
No response
Additional information
No response