coreos / fedora-coreos-docs

Documentation for Fedora CoreOS
https://docs.fedoraproject.org/en-US/fedora-coreos/
Other
49 stars 119 forks source link

Explicitly document procedure for bare metal install on RAID #461

Open codedump opened 1 year ago

codedump commented 1 year ago

Hello,

I'm not sure if this is a bug or just a misunderstanding of mine. But I believe that in any case, even if it's just clarifying the documentation, some change needs to be done, so I'm posting this as a bug. Feel free to move this away otherwise, or kindly point me in the right direction as to where to post it.

Bug

coreos-installer install DEST_DEVICE expects a DEST_DEVICE, but there's either no way, or unclear, as to which device that would be if one would install on a RAID device.

To start, I'm trying to install Fedora CoreOS using an Ignition file that essentially looks like this (Butane format):

variant: fcos
version: 1.4.0

boot_device:
  mirror:
    devices:
      - /dev/nvme0n1
      - /dev/nvme1n1

passwd:
  users:
    - name: ...

storage:

  disks:
    - device: /dev/nvme0n1
      wipe_table: true
      partitions:
        - label: root-1
          size_mib: 50000
        - label: (...others...(

    - device: /dev/nvme1n1
      wipe_table: true
      partitions:
        - label: root-2
          size_mib: 50000
        - label: (...others...)

    - device: /dev/sda
      wipe_table: true
      partitions:
        - label: ...

  filesystems:

    - device: /dev/md/md-root
      wipe_filesystem: true
      format: ext4
      label: root

Creating a custom boot image from a pristine CoreOS install/live ISO:

podman run --security-opt label=disable \
       --pull=always \
       --rm \
       -v .:/data -w /data \
       quay.io/coreos/coreos-installer:release \
       iso customize --live-ignition <IGN-file> \
           -o coreos-modified.iso coreos-pristine.iso

Booting the system using that ISO (on a thumb drive) results in a system that apparently has all the right devices:

# ls -l /dev/md/
total 0
lrwxrwxrwx. 1 root root 8 Sep 20 07:52 md-boot -> ../md127
lrwxrwxrwx. 1 root root 8 Sep 20 07:52 md-root -> ../md126
lrwxrwxrwx. 1 root root 8 Sep 20 07:52 md-sysdata -> ../md125

# ls -l /dev/disk/by-label/
total 0
lrwxrwxrwx. 1 root root 10 Sep 20 07:52 EFI-SYSTEM -> ../../sdb2
lrwxrwxrwx. 1 root root 11 Sep 20 07:52 boot -> ../../md127
lrwxrwxrwx. 1 root root 15 Sep 20 07:52 esp-1 -> ../../nvme0n1p2
lrwxrwxrwx. 1 root root 15 Sep 20 07:52 esp-2 -> ../../nvme1n1p2
lrwxrwxrwx. 1 root root 10 Sep 20 07:52 fedora-coreos-36.20220806.3.0 -> ../../sdb1
lrwxrwxrwx. 1 root root 15 Sep 20 17:07 podcache -> ../../nvme1n1p6
lrwxrwxrwx. 1 root root 11 Sep 20 07:52 root -> ../../md126
lrwxrwxrwx. 1 root root 11 Sep 20 07:52 sys -> ../../md125

Installing fails:

# coreos-intaller install
error: The following required arguments were not provided:
    <DEST_DEVICE>

USAGE:
    coreos-installer install <DEST_DEVICE>

For more information try --help

# coreos-installer install /dev/md/md-root 
Installing Fedora CoreOS 36.20220806.3.0 x86_64 (512-byte sectors)
> Read disk 2.3 GiB/2.3 GiB (100%)    

Note: detected other devices with a filesystem labeled `boot`:
  - /dev/md127
The installed OS may not work correctly if there are multiple boot filesystems.
Before rebooting, investigate whether these filesystems are needed and consider
wiping them with `wipefs -a`.

Install complete.

# mount /dev/md/md-root /mnt -o ro
mount: /var/mnt: wrong fs type, bad option, bad superblock on /dev/md126, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.

# dmesg
...
[34613.292629] GPT: Use GNU Parted to correct GPT errors.
[34613.292632]  md126: p1 p2 p3 p4
[34613.511500]  sda: sda1 sda2
...

Essentially this means that not everyting is in place as it should be (e.g. no root file system at /dev/md/md-root). Apparently what is supposed to be the root filesystem only (i.e. /dev/md/md-root) is now home to 4 different partitions. Booting the system will obviously not result in the desired result.

Specifying a raw device device will fail:

# coreps-installer install /dev/nvme0n1
Installing Fedora CoreOS 36.20220806.3.0 x86_64 (512-byte sectors)
Error: checking for exclusive access to /dev/nvme0n1

Caused by:
    couldn't find /sys/block directory for partition /dev/md126p1 of /dev/nvme0n1

Specifying a totally different drive (e.g. /dev/sda, which is a stand-alone "spinning rust" hard drive) will look like success, but when booting, will run into an unconfigured GRUB (i.e. one with no kernel entries to boot).

Host Operating System Version

Fedora Silverblue

Target Operating System Version

Fedora CoreOS 36.20220806.3.0 (booting from ISO)

coreos-installer Version

coreos-installer 0.15.0

Expected Behavior

coreos-installer install should either install according to the specifications of an ignition file (i.e. not requesting a DEST_DEVICE), or should accept different devices for each task (root filesystem, boot device, and possibly others).

Or documentation should be updated on how to call coreos-installer for the case of installing (a) from ISO, while (b) targeting a RAID disk setup.

Actual Behavior

coreos-installer either complains of insufficient permissions, or it interprets the desired root partition as a whole-disk device.

Reproduction Steps

  1. butane --strict --pretty -d . file.bu --output file.ign
  2. podman run --security-opt label=disable --pull=always --rm -v .:/data -w /data quay.io/coreos/coreos-installer:release iso customize --live-ignition file.ign -o coreos-mod.iso coreos-pristine.iso
  3. dd if=coreos-mod.iso of=/dev/sda && sync
  4. Boot new machine using thumb drive formerly attached as /dev/sda
  5. coreos-installer install ... see above all the possible variations I could think of.

Other Information

Again, this pretty much feels like I'm doing it wrong. I've read all I could find here to exhaustion, all the links related to this and to coreos-installer, and whatever I could find on Google and Reddit. I'm not aware of a CoreOS "user forum" or mailing list that I could ask, if there is one -- a nudge in the right direction would be greatly appreciated.

bgilbert commented 1 year ago

You're right that we don't explicitly document this, and we should. The correct approach is implicit in the existing docs, but is pretty non-obvious if you're used to setting up RAID on other distros. The idea is: don't try to set up RAID before installing. Instead, install onto a single disk, reboot, and let the system reconfigure itself to use RAID. If the OS is given an Ignition config that recreates any of the default filesystems, it'll automatically copy their contents into RAM beforehand and copy them back afterward.

This may seem backwards, but keep in mind that CoreOS is designed as an image-based distro. In the cloud or a VM, the OS is initially launched from a generic image, and then customizes itself with Ignition on the first boot. For consistency, bare-metal installs work the same way: the installer merely writes an image to disk, and then the OS customizes itself when it boots. Because the OS can switch to RAID at boot time, RAID can be used on any Fedora CoreOS system, including in the cloud if desired.

So concretely, you'd install directly onto /dev/nvme0n1 without setting up RAID first. Your Butane config looks okay, but you should write the Ignition config into the ISO with --dest-ignition rather than --live-ignition, because you want to affect the installed OS and not the live OS.

Re discussion forums, there's a Fedora Discussion forum, an Ask Fedora tag, a mailing list, and #fedora-coreos on Libera.Chat. We get some traffic on each of them.

codedump commented 1 year ago

Ok, thanks a ton!

Sounds logical now that you describe it like that :-) I'll try this on a VM first and report results (the actual machine has been shipped off now, attached to the Internet booting straight into the ISO live image, and all the shots I have at installing it correctly better succeed or I end up with paperweight stuck at a initrd maintenance prompt if anything goes wrong...).

PS: I had played around with --dest-ignition vs --live-ignition before, without any success. Essentially this just gave me the empty GRUB prompt. But what I never did was boot into a "pristine" live system that didn't tamper with the drives first, so maybe this does the trick.

bgilbert commented 1 year ago

Oh wow. Let us know how it goes. :sweat_smile:

Since this comes down to an OS-level docs enhancement, I'll retitle the issue and move it to the docs repo.