archlinuxhardened / selinux

PKGBUILDs to build SELinux enabled packages for Arch Linux
145 stars 25 forks source link

systemd-selinux-247.3 - boot fails when SELinux is enabled #81

Closed tqre closed 3 years ago

tqre commented 3 years ago

I tested local builds as the GH Actions pipeline failed, here are some logs I managed to pull:

[    1.732425] SELinux:  Permission perfmon in class capability2 not defined in policy.
[    1.733395] SELinux:  Permission bpf in class capability2 not defined in policy.
[    1.734315] SELinux:  Permission checkpoint_restore in class capability2 not defined in policy.
[    1.735404] SELinux:  Permission perfmon in class cap2_userns not defined in policy.
[    1.736370] SELinux:  Permission bpf in class cap2_userns not defined in policy.
[    1.737347] SELinux:  Permission checkpoint_restore in class cap2_userns not defined in policy.
[    1.738591] SELinux: the above unknown classes and permissions will be denied
[    1.740988] SELinux:  policy capability network_peer_controls=1
[    1.741737] SELinux:  policy capability open_perms=1
[    1.742370] SELinux:  policy capability extended_socket_class=1
[    1.743108] SELinux:  policy capability always_check_network=0
[    1.744000] SELinux:  policy capability cgroup_seclabel=1
[    1.744670] SELinux:  policy capability nnp_nosuid_transition=1
[    1.745419] SELinux:  policy capability genfs_seclabel_symlinks=0
fishilico commented 3 years ago

Hello, I am also testing a local VM with the 3.2-rc1 release of SELinux libraries and tools, and this VM boots fine with systemd-selinux 247.3 and policy crafted from selinux-refpolicy-git. To check whether it was an issue from the policy, I reconfigured it to use selinux-refpolicy-arch 20200818-1, and it still booted fine.

In dmesg, I also see the same warnings (because the policy needs to be upgraded):

[    1.099783] SELinux:  Permission perfmon in class capability2 not defined in policy.
[    1.099785] SELinux:  Permission bpf in class capability2 not defined in policy.
[    1.099786] SELinux:  Permission checkpoint_restore in class capability2 not defined in policy.
[    1.099793] SELinux:  Permission perfmon in class cap2_userns not defined in policy.
[    1.099794] SELinux:  Permission bpf in class cap2_userns not defined in policy.
[    1.099795] SELinux:  Permission checkpoint_restore in class cap2_userns not defined in policy.
[    1.099828] SELinux:  Class lockdown not defined in policy.
[    1.099829] SELinux: the above unknown classes and permissions will be denied
[    1.103115] SELinux:  policy capability network_peer_controls=1
[    1.103116] SELinux:  policy capability open_perms=1
[    1.103117] SELinux:  policy capability extended_socket_class=1
[    1.103117] SELinux:  policy capability always_check_network=0
[    1.103118] SELinux:  policy capability cgroup_seclabel=1
[    1.103119] SELinux:  policy capability nnp_nosuid_transition=1
[    1.103119] SELinux:  policy capability genfs_seclabel_symlinks=0
[    1.163554] audit: type=1403 audit(1612693619.279:2): auid=4294967295 ses=4294967295 lsm=selinux res=1
[    1.168936] systemd[1]: Successfully loaded SELinux policy in 153.559ms.

But these errors do not seem to be fatal.

When using the QCOW image downloaded from the GitHub Action artifacts, the main error messages are:

[    8.583087] systemd[1]: systemd-coredump.socket: Failed to determine SELinux label: Invalid argument
[    8.586030] systemd[1]: Failed to listen on Process Core Dump Socket.
[FAILED] Failed to listen on Process Core Dump Socket.
See 'systemctl status systemd-coredump.socket' for details.
[    8.590747] systemd[1]: systemd-journald-audit.socket: Failed to determine SELinux label: Invalid argument
[    8.592029] systemd[1]: Failed to listen on Journal Audit Socket.
[FAILED] Failed to listen on Journal Audit Socket.
See 'systemctl status systemd-journald-audit.socket' for details.
[    8.594528] systemd[1]: systemd-journald-dev-log.socket: Failed to determine SELinux label: Invalid argument
[    8.595909] systemd[1]: Failed to listen on Journal Socket (/dev/log).
[FAILED] Failed to listen on Journal Socket (/dev/log).
See 'systemctl status systemd-journald-dev-log.socket' for details.
[    8.598028] systemd[1]: systemd-journald.socket: Failed to determine SELinux label: Invalid argument
[    8.599316] systemd[1]: Failed to listen on Journal Socket.
[FAILED] Failed to listen on Journal Socket.
See 'systemctl status systemd-journald.socket' for details.
[DEPEND] Dependency failed for Journal Service.
[DEPEND] Dependency failed for Flus…Journal to Persistent Storage.
[    8.604324] systemd[1]: systemd-networkd.socket: Failed to determine SELinux label: Invalid argument
[    8.605637] systemd[1]: Failed to listen on Network Service Netlink Socket.
[FAILED] Failed to listen on Network Service Netlink Socket.
See 'systemctl status systemd-networkd.socket' for details.
...
[   13.773757] systemd[1]: dbus.socket: Failed to determine SELinux label: Invalid argument
[   13.775100] systemd[1]: Failed to listen on D-Bus System Message Bus Socket.

I do not know (yet) what causes this, but an Arch Linux system without D-Bus is a broken one :'(

fishilico commented 3 years ago

By the way, the system is not completely broken: running qemu-system-x86_64 archselinux.qcow2 -net nic -net user,hostfwd=tcp::10022-:22 -m 2048 (without -nographic) "works" in the meaning that I can log in as root. Then, D-Bus is still broken, there is no journal (logs are in dmesg...), but this is better than nothing, to debug the issue.

tqre commented 3 years ago

Disabling SELinux from kernel command line enables booting, so there is something SELinux and systemd do that don't go together. And you are right, the errors I picked up are not fatal.

It looks like none of the sockets are found, and some other failures in there too. Here is a complete startup log with all the logs I could enable: SELinux_systemd_debug.log

fishilico commented 3 years ago

It seems to be a kernel issue: downgrading to linux 5.10.6-1 and rebooting fixes the issue, in the VM.

In the "buggy VM", cat /proc/self/attr/current does not work and returns Invalid argument. This is likely a side-effect of recent changes in Arch Linux's kernel (such as https://github.com/archlinux/svntogit-packages/commit/69cb8c2d2884181e799e67b09d67fcf7944d8408)

tqre commented 3 years ago

I downgraded a bare-metal testing laptop's kernel to 5.10.6-1, and it indeed works. As SELinux 3.2 has no issues, this issue should go away as soon as we have that version available. I'll see if I can put together the rc2 packages.

fishilico commented 3 years ago

Using packages from the Arch Linux Archive, I got that :

So it is definitively a regression from linux package, and https://github.com/archlinux/svntogit-packages/commit/69cb8c2d2884181e799e67b09d67fcf7944d8408 seems very suspicious. Maybe the new CONFIG_LSM="lockdown,yama,bpf" conflicts with SELinux and this could be overridden in the command line.

Anyway I will not have more time to investigate this issue today, so feel free to continue searching for a fix or to open bug reports on Arch Linux's bug tracker.

tqre commented 3 years ago

I found it! I looked at what CONFIG_LSM does, and it indeed is the key here.

The kernel command parameter security has been deprecated, and lsm=selinux should be used! https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

This works on my VM's and my testing laptop. I'll correct the parameter from the workflow file. I think ArchWiki needs an update regarding this too!

fishilico commented 3 years ago

Great! Thanks for finding this!

I do not have a test system available right now, but I am wondering: should the lsm kernel parameter be lsm=selinux or lsm=selinux,lockdown,yama,bpf or something else (maybe in a different order)? What does cat /sys/kernel/security/lsm show on the test system?

And the Vagrant configuration (https://github.com/archlinuxhardened/selinux/blob/master/_vagrant/step1_install_and_configure.sh#L72-L88) and the wiki page (https://wiki.archlinux.org/index.php/SELinux) will also need to be upgraded accordingly. I can do this, probably in 2-3 days.

tqre commented 3 years ago

cat /sys/kernel/security/lsm shows capability,selinux now. If I understood it right, this is the order in which the lsm bound modules are processed. On a regular Arch it shows capability,lockdown,yama,bpf.

On a test system, I changed the kernel parameter to lsm=selinux,lockdown,yama,bpf, the system boots fine, and cat /sys/kernel/security/lsm shows capability,selinux,lockdown,yama,bpf.

I'll go ahead and put these settings on to the testing VM.

fishilico commented 3 years ago

There is something strange in your parameter: using lsm=selinux,lockdown,yama,bpf breaks the documentation (https://www.kernel.org/doc/html/v5.11-rc7/admin-guide/LSM/index.html):

A list of the active security modules can be found by reading /sys/kernel/security/lsm. This is a comma separated list, and will always include the capability module. The list reflects the order in which checks are made. The capability module will always be first, followed by any “minor” modules (e.g. Yama) and then the one “major” module (e.g. SELinux) if there is one configured.

I asked the selinux (https://lore.kernel.org/selinux/CAJfZ7=nKqT7mmE73r1K3YjBak=OmPACmDi5ccX=SzKhT9=vJ-g@mail.gmail.com/) and the LSM (https://lore.kernel.org/linux-security-module/CAJfZ7=nKqT7mmE73r1K3YjBak=OmPACmDi5ccX=SzKhT9=vJ-g@mail.gmail.com/) mailing lists about this and in the mean time will test whether lsm=lockdown,yama,bpf,selinux would work.

fishilico commented 3 years ago

Test result:

I prefer using lsm=lockdown,yama,selinux,bpf instead of lsm=selinux,lockdown,yama,bpf in order to stick more closely to the documentation.