coreos / ignition

First boot installer and configuration tool
https://coreos.github.io/ignition/
Apache License 2.0
810 stars 243 forks source link

Add Ignition options to create partition or RAID volume only if missing #579

Open coreosbot opened 6 years ago

coreosbot commented 6 years ago

Issue by @travisgroth


Issue Report

Bug

Container Linux Version

# cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1409.7.0
VERSION_ID=1409.7.0
BUILD_ID=2017-07-19-0005
PRETTY_NAME="Container Linux by CoreOS 1409.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

VMware ESXi / PXE Boot

Expected Behavior

When using a storage layout which does not wipe partition tables or filesystems at boot time, the system functions normally. Example CT snippet:

storage:
  disks:
    - device: /dev/sda
      wipe_table: false
      partitions:
        - label: ETCD
          number: 1
          size: 10GiB
        - label: DOCKER
          number: 2
          size: 0

  filesystems:
    - name: etcd
      mount:
        device: /dev/disk/by-partlabel/ETCD
        format: ext4
        create:
          force: false

    - name: docker
      mount:
        device: /dev/disk/by-partlabel/DOCKER
        format: ext4
        create:
          force: true

Actual Behavior

If booted with the provided snippet, the system will hang for a long period of time and eventually reboot. I have yet to catch the screen at reboot time. As soon as I change the 'force' and 'wipe_table' to 'true', the system boots as expected, but it obviously wipes things I don't want to be wiped. A single one of those two settings will trigger this behavior. I've tested with only wipe_table as false and only a single filesystem as create=false and I get the same issue. Leaving out the options has a similar effect.

As one might imagine, I'm trying to carve off persistent space for etcd to live on a master while doing a fresh format on the docker volume to keep things clean.

Reproduction Steps

  1. Create CLC config specifying a device without wipe_table set to true || Create a CLC config specifying a filesystem without a 'force' set to true
  2. Transpile into ignition (tested with 0.4.2, which transpiles into 2.0.0 ignition)
  3. Boot system. It will hang approximately after the RNG initializes

Other Information

coreosbot commented 6 years ago

Comment by @bgilbert


The pause and reboot is symptomatic of an Ignition failure. When Ignition fails, it logs the error to the journal and system console, then periodically prompts you on console to press Enter for an emergency shell. If you don't press Enter within five minutes, the system reboots. If you're not seeing any of that, your system's primary console may be directed to the wrong place (VGA console instead of serial or vice versa). Ensure that your primary console is last in the list of console= kernel command line options. If you do see the prompt, you can press Enter and then run journalctl --no-pager -t ignition to dump the Ignition logs.

When I tried your config under qemu, I got:

Aug 11 04:20:17 localhost ignition[256]: disks: createPartitions: op(3): op(4): [failed]   creating 2 partitions on "/dev_aliases/dev/sda": exit status 4: Cmd: "/sbin/sgdisk" "--new=1:0:+20971520" "--change-name=1:ETCD" "--new=2:0:+0" "--change-name=2:DOCKER" "/dev_aliases/dev/sda" Stdout: "Setting name!\npartNum is 0\nREALLY setting name!\nSetting name!\npartNum is 1\nREALLY setting name!\n" Stderr: "Could not create partition 1 from 34 to 20971553\nCould not create partition 2 from 34 to 2047\nError encountered; not saving changes.\n"

Ignition (and thus CT) currently doesn't have syntax for "create these partitions only if they don't exist". We should probably add support for this. As of the 2.1 Ignition spec, Ignition does have such syntax for filesystems ("wipeFilesystem": false), for similar reasons: supporting multiple Ignition runs on PXE-booted systems (https://github.com/coreos/ignition/pull/351#issuecomment-292155836). Unfortunately that feature is not yet supported by CT.

Until then, I can think of two workarounds:

  1. PXE-boot the machines once with a config that uses wipe_table and lists both filesystems. On subsequent boots, don't list the partition table or the etcd filesystem. This requires your DHCP server or Matchbox to know about the state of your machines.

  2. Ignore Ignition's support for partitioning and creating filesystems. Instead, install a systemd service that does these things directly, including the appropriate conditional logic. Order it Before the corresponding device and mount units. This is feasible in your case because you're not booting from the affected filesystems or repartitioning the boot disk.

Thanks for reporting, and I'm sorry I don't have a better answer for you at the moment.

coreosbot commented 6 years ago

Comment by @travisgroth


Thanks. Commentary:

Thanks for the quick response and analysis.

Travis

coreosbot commented 6 years ago

Comment by @bgilbert


What is the point of wipe and force if they aren’t set to true?

The idea is to prevent data loss. If wipeTable and force are false, Ignition fails if it would need to overwrite data. If true, the partition table or filesystem is always overwritten.

CT does in fact support the translation to ignition wipeFilesystem in syntax 2.0.0 (it is translated correctly at least). My ignition file has this field correct but it still hangs if not set to true.

CT supports the old force flag, but not the new wipeFilesystem flag in the 2.1 spec. The latter has the semantics you're expecting.

Is there some recommended way to handle your kubernetes etcd servers on coreos so that FS persistence isn’t a concern?

That's a good question that I'll leave to others.

testing if the filesystem is already formatted from a unit file is seemingly non-trivial and mkfs is very aggressive without a terminal and keeps wiping my disk.

Something like this might work:

ExecStart=/bin/bash -c '[[ -n "$(blkid /dev/sda1 -s TYPE)" ]] || mkfs.ext4 /dev/sda1'
coreosbot commented 6 years ago

Comment by @travisgroth


The idea is to prevent data loss. If wipeTable and force are false, Ignition fails if it would need to overwrite data. If true, the partition table or filesystem is always overwritten.

Feature idea - make that clear on the console when it finds a partition table unexpectedly?

CT supports the old force flag, but not the new wipeFilesystem flag in the 2.1 spec. The latter has the semantics you're expecting.

Ah. I read that too quickly. What I was referring to was the wipe_table partition option which gets translated to wipeTable.

I assume matchbox has the same limitation as CT here? I see you re-scoped this issue to specifically wipeTable. Should I open a new issue about CT (and matchbox, if that's the case) being behind the ignition spec or is that elsewhere already?

Thanks for the suggestion on the ExecStart. I'll muck with it more today. At this point I'm considering your first idea of a 'diskprep' boot option so the actual master config doesn't come even close to wiping a disk. While more manual, it seems much safer.

coreosbot commented 6 years ago

Comment by @bgilbert


Feature idea - make that clear on the console when it finds a partition table unexpectedly?

The create-if-missing functionality will require Ignition to understand the prior state of the partition table, which it doesn't do currently, so I expect we'll gain better logging once that is added.

I assume matchbox has the same limitation as CT here? I see you re-scoped this issue to specifically wipeTable. Should I open a new issue about CT (and matchbox, if that's the case) being behind the ignition spec or is that elsewhere already?

Matchbox uses the CT code internally, so it has the same limitation. We plan to update CT to the Ignition 2.1 spec after 2.1 reaches a Container Linux stable release. It seems we don't yet have a tracking issue for that; feel free to open one.

coreosbot commented 6 years ago

Comment by @bgilbert


@dgonyeo pointed out that RAID support will also need create-if-missing semantics.

coreosbot commented 6 years ago

Comment by @bgilbert


Assuming no RAID volumes will be auto-assembled in the initramfs1, create-if-missing for RAID has the following implications:

1 https://github.com/coreos/bootengine/pull/130 changes this for RAID volumes needed by the root FS, but at the moment it appears we should exempt boots which run Ignition.

coreosbot commented 6 years ago

Comment by @dghubble


CT v0.5.0 should be updated for the Ignition v2.1.0 spec and Matchbox v0.7.0 (vendors CT v0.5.0) now renders Ignition at the v2.1.0 spec as well, if you've had a chance to try those.

sarathsprakash commented 1 year ago

Any workaround currently to fix this issue , other than maintaining multiple ignitions ?

FreekingDean commented 1 year ago

I believe the above mentioned workaround of using a systemd unit is the only way.

runiq commented 8 months ago

Has there been any movement around this recently? I'm trying to use FCOS to drive a NAS, but the filesystem on top of the RAID data partition gets recreated every time. This appears to be because of the unconditional mdadm --create step, leading to complete data loss on the partition, though this appears to be not fully deterministic: Sometimes the blkid step after mdadm --create is able to identify the original filesystem on my /var partition.

Honestly, it's a little disappointing to see persistent data partitions being treated as second-class citizens. I understand that Ignition is first and foremost made for a cloud environment, and I also understand developer bandwidth is necessarily limited, but this is a use case I would expect to be fairly important in the cloud as well.

jlebon commented 2 months ago

Related discussions in https://github.com/coreos/ignition/pull/1826.