crc-org / vfkit

Apache License 2.0
141 stars 26 forks source link

Bad interaction between selinux and an APFS-backed directory shared over virtiofs #70

Open cfergeau opened 11 months ago

cfergeau commented 11 months ago

Creating an issue for the record, when using virtio-fs:

xattr/setfattr is working and persisted across reboots, but with one big caveat: on linux, security.selinux is special, and can be set even on read-only files (0444 permissions). On macOS, this is not true, if the file is read-only, xattr can't be used to set the security.selinux xattr.

This causes issues with podman-remote run -v /Users/teuf/some-folder:/some/path:Z as this this involves some setfattr calls which failed (Error: error preparing container a1961ae610561be62638aef839ad9f7f2735a16f3fcf15ece95b09865167e415 for attach: setxattr /Users/teuf/dev/crc/.git/objects/00/63a6fe0324d63105a6649c0b46fb27ad767b17: permission denied).

To avoid this, the filesystem in the VM can be mounted with: mount -t virtiofs -o context=system_u:object_r:container_file_t:s0 and the container started with podman-remote run -v /Users/teuf/dev:/test --rm registry.access.redhat.com/ubi8 ls -alZ /test/crc from macOS

cfergeau commented 11 months ago

When sharing a directory between a linux guest and the macos host with virtiofs, the behaviour for setting xattrs on a read-only file inside the guest differs between the linux fs and the virtiofs share.

For example, on a fedora coreos VM:

[core@podman ~]$ mkdir ~/selinux-test-ro
[core@podman ~]$ chmod 400 selinux-test-ro/
[core@podman ~]$ getfattr -m - -d /home/core/selinux-test-ro/
getfattr: Removing leading '/' from absolute path names 
# file: home/core/selinux-test-ro/ security.selinux="unconfined_u:object_r:user_home_t:s0" 
[core@podman ~]$ setfattr -n security.selinux -v "system_u:object_r:var_log_t:s0" selinux-test-ro 
[core@podman ~]$ getfattr -m - -d /home/core/selinux-test-ro getfattr: Removing leading '/' from absolute path names 
# file: home/core/selinux-test-ro/ security.selinux="system_u:object_r:var_log_t:s0" 

# the virtiofs share is mounted on /mnt: 
# vfkit-share on /mnt type virtiofs (rw,relatime,seclabel) 
[core@podman ~]$ mkdir /mnt/selinux-test 
[core@podman ~]$ mkdir /mnt/selinux-test-ro 
[core@podman ~]$ chmod 400 /mnt/selinux-test-ro/ 
[core@podman ~]$ getfattr -m - -d /mnt/selinux-test 
[core@podman ~]$ getfattr -m - -d /mnt/selinux-test-ro 
[core@podman ~]$ setfattr -n security.selinux -v "system_u:object_r:var_log_t:s0" /mnt/selinux-test 
[core@podman ~]$ getfattr -m - -d /mnt/selinux-test 
getfattr: Removing leading '/' from absolute path names 
# file: mnt/selinux-test security.selinux="system_u:object_r:var_log_t:s0" 

[core@podman ~]$ setfattr -n security.selinux -v "system_u:object_r:var_log_t:s0" /mnt/selinux-test-ro/ 
setfattr: /mnt/selinux-test-ro/: Permission denied 
[core@podman ~]$ getfattr -m - -d /mnt/selinux-test-ro 
[core@podman ~]$ 

This is an issue as when using podman run --volume with the z|Z option, a virtiofs share is used under the hood, and podman tries to change the selinux labels of all the files under the shared volume. The selinux label is changed by setting the security.selinux xattr, and it's unexpected that it's not possible for read-only files.

rhatdan commented 11 months ago

Can the return be changes to ENOSUP rather then EPERM? Podman and container tools now handle ENOSUP on :Z and :z gracefully.

rhatdan commented 11 months ago

BTW It seems like the MAC version makes more sense then the Linux one.

rhvgoyal commented 11 months ago

virtiofs is simply a passthrough filesystem. If MacOS host does not allow setfattr operation, it will return that error back to the client.

So what you are seeing is not really related to virtiofs. Its just a setfattr behavior difference between macos and linux.

I find it odd that one can set xattr on read-only file. Say I want to change selinux label of a read-only file so that a certain process can 't get access to it. How would I do that?

My understanding of linux read-only mode is that it is about file data. You can't modify file's data. But it does not prohibit you from changing file's metadata.

If you are trying to make metadata also immutable, I think that will be "immutable" file. And one can use "chattr" for that.

So I would argue that tcalling it "virtio-fs limitation" is not the correct title for this issue.

cfergeau commented 11 months ago

Testing this directly on the mac gives:

% xattr -w security.selinux 'system_u:object_r:var_log_t:s0' ./selinux-test-ro
xattr: [Errno 13] Permission denied: './selinux-test-ro'

errno 13 is EACCES, not EPERM (errno 1)



> So I would argue that tcalling it "virtio-fs limitation" is not the correct title for this issue.

Fair enough, I renamed it to " Bad interactions between selinux and an APFS-backed directory shared over virtiofs" which should be more accurate.
baude commented 11 months ago

is there a solution in mind?

cfergeau commented 11 months ago

So far I was thinking that for this one we were highly dependent on Apple, but seeing these comments from Dan Walsh:

Can the return be changes to ENOSUP rather then EPERM? Podman and container tools now handle ENOSUP on :Z and :z gracefully. BTW It seems like the MAC version makes more sense then the Linux one.

maybe podman could be changed to also ignore EACCES when doing these mappings?

rhatdan commented 11 months ago

Can we check if the underlying file system is read-only?

rhatdan commented 11 months ago

Why do you have read-only content in your homedir?

baude commented 11 months ago

Why do you have read-only content in your homedir?

is it not more than a little likely users will have this kind of thing whether they know it or not ?

rhatdan commented 11 months ago

Well they have to have a read-only partition on their homedir, and they have to attempt to share it into a container which they want to trigger a relabel. The issue is there is not a clear place to say ignore the relabel on ENOACCESS, if the partition is mounted read-only on a virtiofsd.

cfergeau commented 11 months ago

Why do you have read-only content in your homedir?

Files in .git/objects/ and in ~/go/pkg/mod/ are read-only on my mac system, they were created this way by git/golang.

rhatdan commented 11 months ago

Could you look at the labels of those files within the VM with a ls -lZ to see what label they had?

cfergeau commented 11 months ago

These read-only files exist on the macOS side, on the APFS filesystem it uses, they don't have any labels nor xattrs. podman-machine shares the full home directory over virtio-fs, and when the user of podman wants to expose one of these dirs in a container using -v...:Z, the failure occurs.

rhatdan commented 11 months ago

I understand, the question is if they are not relabeled will they fail to be used within the Container, since SELinux will hit them with a permission denied when a process within a container attempts to read the content, If I know the label, then I can at least tell if SELinux will block reading.

As far as the Relabel, we could change the code to only warn on failures to relabel, but this could get mighty noisy. No ideal solution to this issue.

rhatdan commented 10 months ago

@slp could you check if libkrun has the same problems?

slp commented 10 months ago

@slp could you check if libkrun has the same problems?

No, it doesn't. In the virtio-fs implementation in libkrun we deliberately exclude macOS from participating in file permission decisions by storing the permission and ownership bits from Linux PoV as extended attributes:

bash-5.2# mkdir /work/ro
bash-5.2# chmod 400 /work/ro
bash-5.2# ls -l /work
total 0
dr-------- 2 root root 64 Jan 18 18:23 ro
bash-5.2# getfattr -m - -d /work/ro
bash-5.2# setfattr -n security.selinux -v "system_u:object_r:var_log_t:s0" /work/ro
bash-5.2# getfattr -m - -d /work/ro
getfattr: Removing leading '/' from absolute path names
# file: work/ro
security.selinux="system_u:object_r:var_log_t:s0"

bash-5.2#
exit
slp@Sergios-MacBook-Air selinux-test % ls -l
total 0
drwx------@ 2 slp  staff  64 Jan 18 19:23 ro
slp@Sergios-MacBook-Air selinux-test % xattr -l ro
security.selinux: system_u:object_r:var_log_t:s0
user.containers.override_stat: 0:0:040400

I'm afraid that if the virtio-fs implementation in vfkit tries to store the permission bits in Linux as actual permission bits in APFS we'll find plenty of corner cases like this, as the filesystem semantics between Linux and macOS aren't 1:1.

Probably with libkrun we simply noticed those issues earlier because the root filesystem is also mounted via virtio-fs, so we already adopted this alternative strategy for storing the permission and ownership bits.