Consider specializing mount units on first boot, after partitioning

travier commented 4 months ago

Describe the enhancement

We should consider adding some logic somewhere (Ignition or elsewhere) that specialize the /dev/sda entries in system units that Ignition writes to make sure that they stay stable across major OS upgrades.

This would add an override snippet that "fixes" the path to the partition to a well-defined, more stable ID than the default one.

Alternative, but only works for new installations

Another option would be to add new "selecting" options to Ignition to have to figure out "dynamically" which disk to use instead of hard coding names or IDs that may vary per node. For example, I could say: I want this to be on the smallest disk, this on the boot disk, this on the largest one.

Ideally we would ship all udev rules for all clouds, but that does not help us with Bare Metal systems.

System details

Bare Metal and some clouds:

Additional information

This could help with:

jlebon commented 4 months ago

The main problem related to block device naming is feeding stable names to the partitioning or filesystem parts of the Ignition schema. Once you have that though, one should definitely set e.g. a partition or filesystem label that then gets used in the mount unit. That's stable across updates.

I think the issue is that our documentation may not follow that flow everywhere. (And in fact, https://github.com/openshift/os/blob/master/docs/faq.md#q-how-do-i-configure-a-secondary-block-device-via-ignitionmc-if-the-name-varies-on-each-node should be updated so that the mkfs.xfs step sets a label to avoid that whole $VAR_LIB_FOOBAR_DEV dance altogether.)

Edit: https://github.com/openshift/os/pull/1509

travier commented 4 months ago

feeding stable names to the partitioning or filesystem parts of the Ignition schema.

Unfortunately this means that Ignition configs have to be specialized for each node. It's also a problem that you don't know that you will have until an update happens and triggers it and updating the Ignition config at that time will not help.

For selectors, I was thinking of something like:

storage:
    disks:
        - device_selector: 
            size: smallest|largest|<500GB|>500GB
            kind: spinning|!spinning|...
            vendor: "regexp?"
            ...

which gets converted down on successful match to exact device IDs to keep things stable.

jlebon commented 4 months ago

feeding stable names to the partitioning or filesystem parts of the Ignition schema.

Unfortunately this means that Ignition configs have to be specialized for each node.

Right yeah, that's indeed the main problem and why https://github.com/openshift/os/blob/master/docs/faq.md#q-how-do-i-configure-a-secondary-block-device-via-ignitionmc-if-the-name-varies-on-each-node exists.

It's also a problem that you don't know that you will have until an update happens and triggers it and updating the Ignition config at that time will not help.

This is the part I'm trying to expand on: the recommendation should be to define a partition or filesystem label and use by-partlabel/ or by-label/ in your mount unit.

jlebon commented 4 months ago

We discussed this in today's community meeting:

ACTION: jlebon to file a butane issue to discuss improvements around device names (@jlebon:fedora.im, 17:03:24) (edit: https://github.com/coreos/butane/issues/532)
ACTION: travier to file a docs issue to audit storage docs and add some details re. device naming (@jlebon:fedora.im, 17:03:48) (edit: https://github.com/coreos/fedora-coreos-docs/issues/643)

We also mentioned we could emit an MOTD to detect this case on upgrading nodes and give users instructions on how to fix their mount units.

We also mentioned that it might be worth exploring adding more symlinks to make device selection easier in storage.disks and storage.filesystems.

coreos / fedora-coreos-tracker