coreos / ignition

First boot installer and configuration tool
https://coreos.github.io/ignition/
Apache License 2.0
828 stars 245 forks source link

Support creating and mounting Btrfs subvolumes #890

Open laenion opened 4 years ago

laenion commented 4 years ago

Feature Request

Btrfs is a supported file system for Ignition, but currently lacks support for several extended features. One of those features is subvolume support, which is used extensively on *SUSE distributions.

Desired Feature

In a first step Ignition should at least support mounting multiple existing subvolumes from the same device to be able to drop files on there during the files stage. This is currently failing because Ignition does not support using the same device name more than once. Mounting a single subvolume per device is possible since #872 by using appropriate mount options.

In a second step it would be good if Ignition would also support creating additional subvolumes.

Other Information

This is related to and doing the opposite of #815, where multiple disks contain one Btrfs file system.

Example

Subvolume layout from an openSUSE installation:

ID 256 gen 78440 top level 5 path @
ID 258 gen 157397 top level 256 path @/var
ID 259 gen 157350 top level 256 path @/usr/local
ID 260 gen 157389 top level 256 path @/tmp
ID 261 gen 144350 top level 256 path @/srv
ID 262 gen 157352 top level 256 path @/root
ID 263 gen 156614 top level 256 path @/opt
ID 264 gen 157397 top level 256 path @/home
ID 265 gen 124813 top level 256 path @/boot/grub2/x86_64-efi
ID 266 gen 152932 top level 256 path @/boot/grub2/i386-pc
ID 267 gen 157367 top level 256 path @/.snapshots
ID 1038 gen 2151 top level 258 path @/var/lib/machines
ID 1448 gen 154571 top level 267 path @/.snapshots/283/snapshot

Currently it is not possible to mount more than one of these subvolumes besides the root file system.

laenion commented 4 years ago

I was wondering whether it would make sense to extend the path element.

To mount the /home subvolume one could use a syntax such as

 "device": "/dev/sda[/@/home]"

The brackets are escaped by udev, so there will be no conflict with existing path names (e.g. in a file system label), and this is the same syntax used by findmnt for SOURCE. This syntax would keep device as the primary key.

Creation of new subvolumes is more complex due to the fact that Btrfs supports different styles of subvolume layouts. Looking at the example from above we can see that most of the subvolumes are direct children of @ (which is not the root file system, but @/.snapshots/283/snapshot is). On the other hand e.g. @/var/lib/machines is a subvolume of @/var. Probably a distinction between relative and absolute paths (starting with a '/' or not in the brackets maybe?) would be enough, but in any case Ignition would have to make sure the parent is mounted to be able to create the subvolume...

Any thoughts?

laenion commented 4 years ago

Ping: Is there interest in this feature, or is extended Btrfs support out of scope?

jdoss commented 3 years ago

With Fedora's move to btrfs by default would this issue be in scope now?

bgilbert commented 3 years ago

I think this issue is in scope, but I suspect we should add dedicated config attributes for it rather than overloading existing ones.

@jdoss, it seems that Fedora change only affects desktop variants.

cgwalters commented 3 years ago

Exactly https://fedoramagazine.org/btrfs-coming-to-fedora-33/#comment-502620

cmurf commented 3 years ago

Btrfs is coming to Fedora 35 Cloud edition. I'm curious about a sort of "discoverable subvolumes spec" mimicking the discoverable partitions spec. The subvolume names could be self describing instead of using type guids. There's C API and python bindings provided by libbtrfsutil which might help with listing and creating subvolumes.

If the installation image were Btrfs, ignition could use the seed/sprout feature for replication installation: new fs uuid, preserves subvolume layout, and native compression if used; non-btrfs destinations are still possible but gain the benefit of source integrity checking with EIO on any detection corruptions.

bgilbert commented 1 year ago

Ignition supports compound primary keys. We could add a subvolume field to the filesystems section, and have config validation require that it be absent unless the format is btrfs. I think that's probably better than trying to put multiple data items into the device field.

Probably a distinction between relative and absolute paths (starting with a '/' or not in the brackets maybe?) would be enough, but in any case Ignition would have to make sure the parent is mounted to be able to create the subvolume...

Is the idea that a relative path would be relative to the default subvolume, and an absolute path would be relative to the top-level subvolume? Can we instead require that all paths be relative to the top-level subvolume?

In order of preference, I see a few options:

  1. Only support subvolume paths relative to the top-level subvolume. Create subvolumes by mounting the top-level subvolume, then creating subvolumes in order of ascending path length.

  2. Support subvolume paths relative to the top-level subvolume (which I'll call "top-relative") or to the default subvolume (which I'll call "default-relative"). At runtime, mount the top-level subvolume, look up the default subvolume, convert all default-relative subvolume paths to top-relative, fail if the same subvolume is specified multiple times, and create subvolumes in order of ascending path length.

  3. Support both default-relative and top-relative subvolume paths, but don't try to disambiguate them. Do two creation passes, one for each type.

    I don't think this would work well for Ignition. To keep the config declarative, Ignition needs to ensure that the same object can't be referenced multiple times, since otherwise it would matter in what order the operations are performed. We could define an order (e.g. "absolute first, then relative") but we've generally avoided that approach so far.

Thoughts?

bgilbert commented 1 year ago

And I guess we'd have to reject label for subvolumes?

laenion commented 1 year ago

Ignition supports compound primary keys. We could add a subvolume field to the filesystems section, and have config validation require that it be absent unless the format is btrfs. I think that's probably better than trying to put multiple data items into the device field.

Agreed.

Probably a distinction between relative and absolute paths (starting with a '/' or not in the brackets maybe?) would be enough, but in any case Ignition would have to make sure the parent is mounted to be able to create the subvolume...

Is the idea that a relative path would be relative to the default subvolume, and an absolute path would be relative to the top-level subvolume?

Not necessarily the default subvolume, but whatever is used for the current root ('/') file system.

Can we instead require that all paths be relative to the top-level subvolume?

That sounds like a very reasonable approach, as everything will be reachable from the top-level subvolume (but not necessarily from the root file system).

In order of preference, I see a few options:

1. Only support subvolume paths relative to the top-level subvolume.  Create subvolumes by mounting the top-level subvolume, then creating subvolumes in order of ascending path length.

Thinking about this again this is probably also the expected solution from a user's perspective: At least when using snapshots for the root file system in most cases one probably doesn't want to have subvolumes of a snapshot.

And I guess we'd have to reject label for subvolumes?

Yes, btrfs subvolumes don't have labels themselves.

bgilbert commented 1 year ago

Okay, sounds good!

Not necessarily the default subvolume, but whatever is used for the current root ('/') file system.

That couldn't work anyway. Filesystem creation happens in the disks stage, before the OS mounts /sysroot, and Ignition doesn't know what filesystem will be mounted there.

queeup commented 1 year ago

Any news about this? I really love to have this.

har7an commented 1 year ago

I'm also very interested in seeing this feature.

What exactly is the "top-level" subvolume you are referring to? Do you mean subvolid=5,subvol=/? Personally I'd appreciate if one was able to create a partition layout similar to Fedoras default, with a root, home and var (and more as needed) subvolumes below subvolid=5, which are then mounted accordingly.

bgilbert commented 1 year ago

subvolid=5,subvol=/ is the top-level subvolume, yes. I believe option 1 from https://github.com/coreos/ignition/issues/890#issuecomment-1307895353 woud provide what you're looking for?

har7an commented 1 year ago

@bgilbert Yup, that would work just fine.

Out of curiosity: I don't see a /etc/fstab on CoreOS systems, so I assume that mounts are handled entirely through systemd mount units. Is that correct? So assuming I'd like to do a more elaborate Btrfs setup on CoreOS today, I'd have to create individual mount units for all subvolumes and then sort out the ordering between the units?

bgilbert commented 1 year ago

Yes, you'd need a separate mount unit for each subvolume. Butane normally creates that for you if you specify with_mount_unit: true, but it would need to learn about btrfs subvolumes.

I assume btrfs itself doesn't impose any particular mount order requirements? systemd mount units automatically order themselves with respect to parent mounts (see "Implicit Dependencies" in systemd.mount(5)) so no special ordering is required there.

dsreyes1014 commented 8 months ago

Hey guys. Dusty helped me figure a workaround to get subvolumes mounted but it seems it has to be declared here:

storage:
  filesystems:
    - path: /var
      device: /dev/disk/by-id/$DISKID
      format: btrfs
      mount_options: [subvol=$SUBVOLUME1]
      ...
    - path: /var/home
      device: /dev/disk/by-uuid/$DISKUUID
      format: btrfs
      mount_options: [subvol=$SUBVOLUME2]
      ...

and not with systemd.mount units for reasons of the order the ignition file executes.

An issue here is you can't declare the same device twice as butane errors out with a duplication message hence I use $DISKID for the first mount and $DISKUUID for the second (both symlinked to the same device). Another issue here is we are limited to mounting two subvolumes unless we can somehow create more persistent symlinks to the device. There is a discussion here with a bit more detail about my findings with this.

har7an commented 7 months ago

@dsreyes1014

I understand how this mounts Btrfs subvolumes (and that's a pretty creative workaround btw), but this doesn't actually create them from a bare disk, or does it? Are you creating these subvolumes by hand in advance?

dsreyes1014 commented 7 months ago

@har7an

Not at the moment. Ignition doesn't have this capability for subolume creating and mounting. I am creating the subvolumes beforehand manually.

har7an commented 6 months ago

So, out of pure curiosity and since this is something I'd really like to see (All my FCOS instances run on Btrfs): Where would a change like this need to be implemented? I feel like this discussion isn't actively tracked or implemented anywhere, but maybe that's just my impression from following this thread. If my time permits and I don't have to modify a dozen repos all at once, maybe I'd give it a stab.

I think that having some implementation would at least allow us to discuss how we best implement it and what's feasible.

ispanos commented 1 month ago

Is it possible to allow mounting multiple subvolumes, by ignoring duplicate devices if we have format: btrfs? At least as a temporary workaround for those of use who already have a specific structure.