coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

Non-root Clevis LUKS devices not automatically mounted on reboot #687

Closed sedlund closed 3 years ago

sedlund commented 3 years ago

Describe the bug luks clevis tang volume not mounted on reboot after initial ignition

Reproduction steps

variant: fcos
version: 1.2.0
passwd:
  users:
    - name: core
      password_hash: b6/
      ssh_authorized_keys:
        - ssh-rsa AAh+b8pAeK9VUa0EjS
storage:

  disks:
    - device: /dev/sda
      wipe_table: false
      partitions:
        - size_mib: 0
          start_mib: 5000
          label: luks
  luks:
    - name: data
      device: /dev/disk/by-partlabel/luks
      label: data
      clevis:
        tang:
          - url: http://192.168.22.1:8080
            thumbprint: sfafRrtIN5XVw
      wipe_volume: true

  filesystems:
    - path: /var/lib/containers
      device: /dev/disk/by-id/dm-name-data
      format: btrfs
      label: var
      mount_options:
        - compress=zstd
      wipe_filesystem: true
      with_mount_unit: true

Expected behavior Expected machine to boot and mount the filesystem.

Actual behavior Asks for password on the console until it times out and doesn't mount.

System details

Additional information Tracked down that clevis-luks-askpass.path is not enabled by default. The documentation and examples on the website do not mention having to manually enable this. Not sure if this is by design but it seems like if I'm using ignition to configure clevis, it should enable?

I added:

systemd:
  units:
    - name: clevis-luks-askpass.path
      enabled: true

and a fresh install works as expected.

jlebon commented 3 years ago

Ahh yes, good catch. It's enabled in the initrd via https://github.com/latchset/clevis/blob/a07e75345c12737b2a54c4fbc697ec8dd68bd28f/src/luks/systemd/dracut/clevis/module-setup.sh.in#L31, but not in the real root. It's not in the default Fedora presets, which I think makes sense because you only really need this if you're using Clevis. We could have FCC sugar for this I suppose, but meh... seems fine to just enable it in our presets. Path units are usually pretty lightweight since they're just inotify watches.

sedlund commented 3 years ago

@jlebon as a minimalist - it seems that filesystems::with_mount_unit adds mount sugar. personally i would think using clevis should do similar in fcct.

jlebon commented 3 years ago

@jlebon as a minimalist - it seems that filesystems::with_mount_unit adds mount sugar. personally i would think using clevis should do similar in fcct.

Yeah, that's a possibility. Not sure it's worth the complexity vs just enabling it. Unconditionally enabling it also makes it more consistent with the initrd.

jlebon commented 3 years ago

I played with this last week:

diff --cc overlay.d/05core/usr/lib/systemd/system-preset/40-coreos.preset
index 183366e,183366e..2298228
--- a/overlay.d/05core/usr/lib/systemd/system-preset/40-coreos.preset
+++ b/overlay.d/05core/usr/lib/systemd/system-preset/40-coreos.preset
@@@ -26,3 -26,3 +26,4 @@@ enable bootupd.socke
  # The event for the attached device comes as a diag event.
  # Ideally it should have been added as part of base Fedora - but since it was arch specific, it was not added: https://bugzilla.redhat.com/show_bug.cgi?id=1433859
 enable rtas_errd.service
+enable clevis-luks-askpass.path

But I still couldn't get Tang-pinned mounts in the real root to work in a quick test. Haven't debugged further yet.

dgiebert commented 3 years ago

With the setup described: mount_unit and the enabled service I get circular ordering in the NetworkManager and therefore it cant unlock the device.

nicolamarella commented 3 years ago

I am also facing a similar issue. Do you have at least a suggestion on what's the most effective way to debug this? thank would be already quite helpful! Thank you in advance.

jlebon commented 3 years ago

To make sure we're all on the same page, this issue is about Clevis LUKS devices other than the rootfs. For the rootfs, it is known to not work in Fedora until f34 (see https://github.com/coreos/fedora-coreos-tracker/issues/692).

sedlund commented 3 years ago

mount_unit and the enabled service I get circular ordering in the NetworkManager and therefore it cant unlock the device.

@dgiebert maybe try mount_options: _netdev

dgiebert commented 3 years ago

@sedlund tried and a reboot did still not work

jlebon commented 3 years ago

I'll take this.

jlebon commented 3 years ago

I played with this last week:

diff --cc overlay.d/05core/usr/lib/systemd/system-preset/40-coreos.preset
index 183366e,183366e..2298228
--- a/overlay.d/05core/usr/lib/systemd/system-preset/40-coreos.preset
+++ b/overlay.d/05core/usr/lib/systemd/system-preset/40-coreos.preset
@@@ -26,3 -26,3 +26,4 @@@ enable bootupd.socke
  # The event for the attached device comes as a diag event.
  # Ideally it should have been added as part of base Fedora - but since it was arch specific, it was not added: https://bugzilla.redhat.com/show_bug.cgi?id=1433859
 enable rtas_errd.service
+enable clevis-luks-askpass.path

But I still couldn't get Tang-pinned mounts in the real root to work in a quick test. Haven't debugged further yet.

I guess whatever I was hitting there doesn't happen anymore. This works fine now! https://github.com/coreos/fedora-coreos-config/pull/942

dustymabe commented 3 years ago

The fix for this went into next stream release 34.20210413.1.0. Please try out the new release and report issues.

The fix for this went into testing stream release 33.20210412.2.0. Please try out the new release and report issues.

dustymabe commented 3 years ago

The fix for this went into stable stream release 33.20210412.3.0.