Open coreosbot opened 6 years ago
Comment by @bgilbert
The pause and reboot is symptomatic of an Ignition failure. When Ignition fails, it logs the error to the journal and system console, then periodically prompts you on console to press Enter for an emergency shell. If you don't press Enter within five minutes, the system reboots. If you're not seeing any of that, your system's primary console may be directed to the wrong place (VGA console instead of serial or vice versa). Ensure that your primary console is last in the list of console=
kernel command line options. If you do see the prompt, you can press Enter and then run journalctl --no-pager -t ignition
to dump the Ignition logs.
When I tried your config under qemu, I got:
Aug 11 04:20:17 localhost ignition[256]: disks: createPartitions: op(3): op(4): [failed] creating 2 partitions on "/dev_aliases/dev/sda": exit status 4: Cmd: "/sbin/sgdisk" "--new=1:0:+20971520" "--change-name=1:ETCD" "--new=2:0:+0" "--change-name=2:DOCKER" "/dev_aliases/dev/sda" Stdout: "Setting name!\npartNum is 0\nREALLY setting name!\nSetting name!\npartNum is 1\nREALLY setting name!\n" Stderr: "Could not create partition 1 from 34 to 20971553\nCould not create partition 2 from 34 to 2047\nError encountered; not saving changes.\n"
Ignition (and thus CT) currently doesn't have syntax for "create these partitions only if they don't exist". We should probably add support for this. As of the 2.1 Ignition spec, Ignition does have such syntax for filesystems ("wipeFilesystem": false
), for similar reasons: supporting multiple Ignition runs on PXE-booted systems (https://github.com/coreos/ignition/pull/351#issuecomment-292155836). Unfortunately that feature is not yet supported by CT.
Until then, I can think of two workarounds:
PXE-boot the machines once with a config that uses wipe_table
and lists both filesystems. On subsequent boots, don't list the partition table or the etcd
filesystem. This requires your DHCP server or Matchbox to know about the state of your machines.
Ignore Ignition's support for partitioning and creating filesystems. Instead, install a systemd service that does these things directly, including the appropriate conditional logic. Order it Before
the corresponding device and mount units. This is feasible in your case because you're not booting from the affected filesystems or repartitioning the boot disk.
Thanks for reporting, and I'm sorry I don't have a better answer for you at the moment.
Comment by @travisgroth
Thanks. Commentary:
I could have been clearer. I just never personally saw the failure screen. It would always reload the moment I stepped away.
Ignition/CT supposedly does support this according to the spec (wipe/force are optional). That it requires certain options under circumstances where it ran before is a bit of a surprise. I understand how this might be not be supported but the documentation and syntax makes that really unclear. What is the point of wipe and force if they aren’t set to true? It seems that they are required if you don’t have virgin disks.
CT does in fact support the translation to ignition wipeFilesystem in syntax 2.0.0 (it is translated correctly at least). My ignition file has this field correct but it still hangs if not set to true.
Really, the underlying question here - there’s a lot of examples of running etcd for k8s masters in the ignition/clc/coreos docs but I’m not seeing anything that clearly addresses where the storage goes or how to deal with it safely. Is there some recommended way to handle your kubernetes etcd servers on coreos so that FS persistence isn’t a concern? While I see how one could argue that tmpfs on many systems is good enough, it is problematic in a power loss scenario. My approach seemed obvious but maybe there’s something else I should be doing.
I’m already working toward your recommendation #2, but testing if the filesystem is already formatted from a unit file is seemingly non-trivial and mkfs is very aggressive without a terminal and keeps wiping my disk.
Thanks for the quick response and analysis.
Travis
Comment by @bgilbert
What is the point of wipe and force if they aren’t set to true?
The idea is to prevent data loss. If wipeTable
and force
are false, Ignition fails if it would need to overwrite data. If true, the partition table or filesystem is always overwritten.
CT does in fact support the translation to ignition wipeFilesystem in syntax 2.0.0 (it is translated correctly at least). My ignition file has this field correct but it still hangs if not set to true.
CT supports the old force
flag, but not the new wipeFilesystem
flag in the 2.1 spec. The latter has the semantics you're expecting.
Is there some recommended way to handle your kubernetes etcd servers on coreos so that FS persistence isn’t a concern?
That's a good question that I'll leave to others.
testing if the filesystem is already formatted from a unit file is seemingly non-trivial and mkfs is very aggressive without a terminal and keeps wiping my disk.
Something like this might work:
ExecStart=/bin/bash -c '[[ -n "$(blkid /dev/sda1 -s TYPE)" ]] || mkfs.ext4 /dev/sda1'
Comment by @travisgroth
The idea is to prevent data loss. If wipeTable and force are false, Ignition fails if it would need to overwrite data. If true, the partition table or filesystem is always overwritten.
Feature idea - make that clear on the console when it finds a partition table unexpectedly?
CT supports the old force flag, but not the new wipeFilesystem flag in the 2.1 spec. The latter has the semantics you're expecting.
Ah. I read that too quickly. What I was referring to was the wipe_table partition option which gets translated to wipeTable.
I assume matchbox has the same limitation as CT here? I see you re-scoped this issue to specifically wipeTable. Should I open a new issue about CT (and matchbox, if that's the case) being behind the ignition spec or is that elsewhere already?
Thanks for the suggestion on the ExecStart. I'll muck with it more today. At this point I'm considering your first idea of a 'diskprep' boot option so the actual master config doesn't come even close to wiping a disk. While more manual, it seems much safer.
Comment by @bgilbert
Feature idea - make that clear on the console when it finds a partition table unexpectedly?
The create-if-missing functionality will require Ignition to understand the prior state of the partition table, which it doesn't do currently, so I expect we'll gain better logging once that is added.
I assume matchbox has the same limitation as CT here? I see you re-scoped this issue to specifically wipeTable. Should I open a new issue about CT (and matchbox, if that's the case) being behind the ignition spec or is that elsewhere already?
Matchbox uses the CT code internally, so it has the same limitation. We plan to update CT to the Ignition 2.1 spec after 2.1 reaches a Container Linux stable release. It seems we don't yet have a tracking issue for that; feel free to open one.
Comment by @bgilbert
@dgonyeo pointed out that RAID support will also need create-if-missing semantics.
Comment by @bgilbert
Assuming no RAID volumes will be auto-assembled in the initramfs1, create-if-missing for RAID has the following implications:
If Ignition is configured to create a RAID volume if missing, and the array already exists, Ignition must still start the array.
A config will need to declare a RAID volume if the config wants to reuse the volume but create a filesystem on it, or create a file in an existing filesystem on the volume. Doing those things is currently impossible, so this is strictly an improvement, and it's consistent with the existing requirement to declare existing filesystems before creating files on them. We'll need a way to declare a volume that should never be created but must exist and must be started by Ignition.
1 https://github.com/coreos/bootengine/pull/130 changes this for RAID volumes needed by the root FS, but at the moment it appears we should exempt boots which run Ignition.
Any workaround currently to fix this issue , other than maintaining multiple ignitions ?
I believe the above mentioned workaround of using a systemd unit is the only way.
Has there been any movement around this recently? I'm trying to use FCOS to drive a NAS, but the filesystem on top of the RAID data partition gets recreated every time. This appears to be because of the unconditional mdadm --create
step, leading to complete data loss on the partition, though this appears to be not fully deterministic: Sometimes the blkid
step after mdadm --create
is able to identify the original filesystem on my /var
partition.
Honestly, it's a little disappointing to see persistent data partitions being treated as second-class citizens. I understand that Ignition is first and foremost made for a cloud environment, and I also understand developer bandwidth is necessarily limited, but this is a use case I would expect to be fairly important in the cloud as well.
Related discussions in https://github.com/coreos/ignition/pull/1826.
Issue by @travisgroth
Issue Report
Bug
Container Linux Version
Environment
VMware ESXi / PXE Boot
Expected Behavior
When using a storage layout which does not wipe partition tables or filesystems at boot time, the system functions normally. Example CT snippet:
Actual Behavior
If booted with the provided snippet, the system will hang for a long period of time and eventually reboot. I have yet to catch the screen at reboot time. As soon as I change the 'force' and 'wipe_table' to 'true', the system boots as expected, but it obviously wipes things I don't want to be wiped. A single one of those two settings will trigger this behavior. I've tested with only wipe_table as false and only a single filesystem as create=false and I get the same issue. Leaving out the options has a similar effect.
As one might imagine, I'm trying to carve off persistent space for etcd to live on a master while doing a fresh format on the docker volume to keep things clean.
Reproduction Steps
Other Information