containers / crun

A fast and lightweight fully featured OCI runtime and C library for running containers
GNU General Public License v2.0
2.9k stars 293 forks source link

Exposing/leveraging filesystem features for containers (casefolding, reflinks) #1178

Open Conan-Kudo opened 1 year ago

Conan-Kudo commented 1 year ago

In a discussion during Container Plumbing Days 2023, @rhatdan and I got to talking about running Windows applications as Linux containers using Wine. He mentioned that there are filesystem quirks preventing that from working effectively.

In that discussion, I noted that per-directory casefolding is supported in ext4 and Wine can use that if available, and it's on the roadmap for Btrfs as well. Additionally, there's been a proposal and patch to take advantage of reflinks in Wine for space savings.

That led to him suggesting I open a ticket for exploring how to expose these capabilities in filesystems when applications benefit from them. This particular gap in OCI containers is one of the reasons I wind up using systemd-nspawn instead, so it'd be nice to have this in OCI-based containers too.

cc: @davide125, @josefbacik, @giuseppe

giuseppe commented 1 year ago

thanks for starting the discussion!

In that discussion, I noted that per-directory casefolding is supported in ext4 and Wine can use that if available, and it's on the roadmap for Btrfs as well. Additionally, there's been a proposal and patch to take advantage of reflinks in Wine for space savings.

can we set it recursively for the entire directory and its descendants?

That led to him suggesting I open a ticket for exploring how to expose these capabilities in filesystems when applications benefit from them. This particular gap in OCI containers is one of the reasons I wind up using systemd-nspawn instead, so it'd be nice to have this in OCI-based containers too.

what exactly would you like to see in crun (or Podman) that is missing now?

Conan-Kudo commented 1 year ago

can we set it recursively for the entire directory and its descendants?

Yes, I believe so. I think when you create an empty directory and set the property, everything inherits it by default.

rhatdan commented 1 year ago

I would like to see crun-wine which works like crun-wasm works.

Then if the file system handled it, pull "arch=windows" images and attempt to run them with --crun-wine

Conan-Kudo commented 1 year ago

what exactly would you like to see in crun (or Podman) that is missing now?

One thing that's probably out of scope is being able to execute a directory or btrfs subvolume as a filesystem root. That makes iterating on base containers super-nice. Also because using btrfs subvolumes directly means I get all that fanciness right out of the gate.

giuseppe commented 1 year ago

One thing that's probably out of scope is being able to execute a directory or btrfs subvolume as a filesystem root.

could you use podman run ... --rootfs $PATH_YOU_WANT_AS_ROOTFS?

Conan-Kudo commented 1 year ago

Huh TIL, but I guess it wouldn't automatically create a btrfs snapshot to work from to emulate the "ephemeral-ness"? nspawn can do that. But regardless, that's pretty cool.

giuseppe commented 1 year ago

you could specify :O to create an overlay mount on top of it.

podman run ... --rootfs $PATH_YOU_WANT_AS_ROOTFS:O

That is ephemeral but using overlay

Hi-Angel commented 11 months ago

talking about running Windows applications as Linux containers using Wine. He mentioned that there are filesystem quirks preventing that from working effectively

FTR, I'm not quite clear what are those quirks. Case-insensitivity is handled by WINE internally, I don't think there's anything to be done on a filesystem or Podman side, is there?

rhatdan commented 11 months ago

The issue as I recall was the file systems being pulled were being stored in mixed case, and this caused issues. Wine issues happened later.

Conan-Kudo commented 11 months ago

talking about running Windows applications as Linux containers using Wine. He mentioned that there are filesystem quirks preventing that from working effectively

FTR, I'm not quite clear what are those quirks. Case-insensitivity is handled by WINE internally, I don't think there's anything to be done on a filesystem or Podman side, is there?

This is not necessarily true. There's been a lot of work lately to delegate capabilities to Linux where it'd be more efficient to do so. The introduction of filesystem-level casefolding is one such example. Another is the futex_waitv() syscall.

Conan-Kudo commented 11 months ago

you could specify :O to create an overlay mount on top of it.

podman run ... --rootfs $PATH_YOU_WANT_AS_ROOTFS:O

That is ephemeral but using overlay

In this scenario, OverlayFS needs support for casefolding.

Hi-Angel commented 11 months ago

@rhatdan

The issue as I recall was the file systems being pulled were being stored in mixed case, and this caused issues

I still don't see why mixed case is not okay. Podman per se doesn't care about the case of files it works with. And then WINE apps I assume should just work because WINE does case-insensitivity translation internally.

@Conan-Kudo

This is not necessarily true. There's been a lot of work lately to delegate capabilities to Linux where it'd be more efficient to do so. The introduction of filesystem-level casefolding is one such example. Another is the futex_waitv() syscall.

AFAIK all of that is optimization related, not something WINE haven't solved some other way. More over, the futex2 aka futex_waitv() is still not used by WINE despite being merged to the kernel long ago. It may be used in Proton though. And the reflinks you mentioned in 1st post were never merged either, probably because that was from an individual contributor, and contributing to WINE for outsiders is tough for non-technical reasons.

rhatdan commented 11 months ago

Well its been a while you could try to pull a Windows container and see what happens.

rhatdan commented 11 months ago

If community or @flouthoc or @giuseppe would be willing to work with this, I would be thrilled.

I just pulled a Microsoft image and it stored it fine, someone needs to create a crun that uses libwine, and see if they could get this to work.