containers / composefs

a file system for mounting container images
GNU General Public License v2.0
413 stars 27 forks source link

Canonical method to find backing filesystem (and block device) #280

Open cgwalters opened 3 months ago

cgwalters commented 3 months ago

See https://github.com/ostreedev/ostree/issues/2867#issuecomment-2108090355 - there's a lot of tooling which wants to find the backing filesystem or block device.

As far as I am aware there is no canonical method to do this. Systemd ends up writing a symlink in /run for its volatile root bits, but that's it. I think right now one would need to parse the overlayfs mount options in userspace which seems really fragile.

overlayfs itself is super flexible - one can union multiple lower filesystems, etc. However for us...I think what I would say is that the default "backing filesystem" is the one holding the erofs metadata.

What'd be pretty nice is if there was some way to attach arbitrary metadata (like xattrs) to a vfs mount.

Failing that though, what we could do since composefs always requires full privileges to mount (because of the erofs usage) is to just generalize what systemd does a bit and have e.g. /run/composefs/$something? Where $something is maybe the kernel mount id? (Though the fact it can be reused is problematic)

alexlarsson commented 3 months ago

mount ids are per namespace though, aren't they, so also not ideal? Do you really need the filesystem, or is the minor+major of the block device enough?

alexlarsson commented 3 months ago

One place where we can encode something is the lower dir mountpoint (i.e. where we mount the erofs), because this is not really visible in the root namespace. For example, this is how an ostree rootfs looks in /proc/mounts:

overlay / overlay ro,seclabel,relatime,lowerdir=/run/ostree/.private/cfsroot-lower::/sysroot/ostree/repo/objects,redirect_dir=on,verity=require,metacopy=on 0 0

We could easily change the filename of /run/ostree/.private/cfsroot-lower to something that e.g. encodes the major/minor of the backing device. Or we can set an xattr on this file, it is only a mountpoint after all, not the eorfs mount.

cgwalters commented 3 months ago

Do you really need the filesystem, or is the minor+major of the block device enough?

For things like "resize the underlying filesystem", we definitely want the filesystem. In theory I think we can, given a block device, try to find the filesystem mounted on it, but that's obviously ugly.

We could easily change the filename of /run/ostree/.private/cfsroot-lower to something that e.g. encodes the major/minor of the backing device. Or we can set an xattr on this file, it is only a mountpoint after all, not the eorfs mount.

Yeah, xattrs on the underlying mount point are probably OK to start, at least they wouldn't "leak" into the mounted fs.

So today we support multiple datalowers, probably the xattr should include the device number for all of them. And since we need to serialize to a string, we might as well make it decimal formatted so it can be conveniently prepended with e.g. /sys/dev/block.