canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 931 forks source link

Better support unprivileged user namespaces being disabled #11920

Open simondeziel opened 1 year ago

simondeziel commented 1 year ago

ATM, the LXD snap unconditionally enables unprivileged user namespaces (echo 1 > /proc/sys/kernel/unprivileged_userns_clone) because some features depend on this:

However, unprivileged user namespaces come with some risks* and are not strictly required if you trade some usablility to favor security. I've been using the LXD snap with userns disabled for ~2 months and found it to be a reasonable experience. Here are the workarounds used to overcome some of the issues stemming from userns=0:

Install either of those and they will automatically start. Alternatively you may use another SPICE client using the following URI: spice+unix:///home/sdeziel/snap/lxd/common/config/sockets/137487013.spice

Your system has unprivileged user namespaces disabled, as a result, you will need to manually run one a SPICE client with the URI above.

Start spicy manually using the URI provided by lxc console --type vga

spicy --uri=spice+unix:///home/sdeziel/snap/lxd/common/config/sockets/137487013.spice



Ultimately, here's what I'd like LXD to improve:

* detect that it is running with `unprivileged_userns_clone=0` and issue a warning about degraded functionalities (maybe with a link to an online doc explaining in details)
* if `unprivileged_userns_clone=0`, use an editor from the snap even if `EDITOR`/`VISUAL` is set (run snap's vi if EDITOR=vi/vim/nvim/gvim, nano otherwise)
* stop the snap from unconditionally enabling `unprivileged_userns_clone` (it too could log a warning if deemed appropriate)

*: 44.4% of vulnerabilities used userns, https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html
tomponline commented 1 year ago

userns=0

In this context what is userns=0 btw?

simondeziel commented 1 year ago

@tomponline I meant unprivileged_userns_clone=0, edited. Thanks.

alexmurray commented 1 year ago

With the upcoming plan to restrict unprivileged userns in Ubuntu 23.10, lxd may need to grow some additional support around its handling of unprivileged user namespaces - see https://discourse.ubuntu.com/t/spec-unprivileged-user-namespace-restrictions-via-apparmor-in-ubuntu-23-10/37626 for details

Similarly to above, when this new restriction is enabled in Ubuntu 23.10 lxc console --type vga fails - I notice this ends up spawning unshare -U to run remote-viewer (and unshare -U fails) but I can't seem to find the code path that is running remote-viewer via unshare - can anyone shed any light on how this exec path ends up happening?

Seen in snap run --strace lxd.lxd console --type vga ...:

[pid 413939] execve("/usr/bin/unshare", ["unshare", "-U", "-r", "chroot", "/var/lib/snapd/hostfs/", "remote-viewer", "spice+unix:///home/amurray/snap/"...], 0x563d48d9c818 /* 102 vars */) = 0
...
[pid 413939] unshare(CLONE_NEWUSER)     = -1 EACCES (Permission denied)
alexmurray commented 1 year ago

Interestingly, unshare is not running under any apparmor confinement when it is spawned

[84241.526723] audit: type=1400 audit(1693372708.093:6239): apparmor="DENIED" operation="userns_create" class="namespace" info="User namespace creation restricted" error=-13 profile="unconfined" pid=413939 comm="unshare" requested="userns_create" denied="userns_create"

so it would be possible to "fix" this by creating an AppArmor profile on the host that allowed unshare to user userns in /etc/apparmor.d/usr.bin.unshare:

abi <abi/4.0>,

include <tunables/global>

/usr/bin/unshare flags=(unconfined) {
  userns,

  # Site-specific additions and overrides. See local/README for details.
  include if exists <local/usr.bin.unshare>
}

but this then allows any exploit which wants to use an unprivileged user namespace to just run itself via unshare -U and so kind of defeats the purpose of this restriction.

alexmurray commented 1 year ago

cc @jrjohansen - I assume you agree that it isn't feasible to have a generic profile for unshare as above? It would make a lot of things easier (and reduce the risk that we regress every local users random scripts that run things via unshare -U) - or any other ideas here?

jrjohansen commented 1 year ago

@alexmurray you are correct, having a generic unshare profile provides an easy by-pass of the restriction is not advisable.

tomponline commented 1 year ago

@alexmurray so are you saying that going forward all calls to unshare -U need to be wrapped in their own apparmor profile?

tomponline commented 1 year ago

lxd may need to grow some additional support around its handling of unprivileged user namespaces

LXD the daemon runs as root and then launches unprivileged processes with their own apparmor profiles so hopefully only the client tools will be affected by this. Does that sound OK?

tomponline commented 1 year ago

Although is this going to break security.nesting=true?

tomponline commented 1 year ago

@simondeziel do you have any capacity to help evaluate the impact of this?

jrjohansen commented 1 year ago

@tomponline the restriction is specifically on unprivileged unconfined. So an unconfined process with cap_sysadmin will be able to create a new user ns via clone or unshare. It will also be able to call unshare as long as it doesn't drop privs before doing so.

The attack we are trying to mitigate against is a privilege escalation where, an attack uses an unprivileged user namespace to gain pseudo root, with access to root restricted kernel interfaces. It can then use this access to attempt an exploit that is not possible by a regular user.

tomponline commented 1 year ago

what is unprivileged unconfined? I am not too familiar with apparmor?

And how does all this relate to snaps?

alexmurray commented 1 year ago

@alexmurray so are you saying that going forward all calls to unshare -U need to be wrapped in their own apparmor profile?

We have 2 options -

  1. we ship a profile for /usr/bin/unshare itself - but then this allows anything to use unshare -U to gain access to an unprivileged user namespace and hence circumvent this restriction. Or
  2. we need to provide profiles for everything that legitimately uses either /usr/bin/unshare or directly calls the unshare()/setns() system calls.

The second option is a better outcome from a security perspective but harder as we have to try and identify all of these things - whilst the first is easier but allows this restriction to be trivially bypassed (especially since /usr/bin/unshare is shipped in util-linux which is marked as essential by apt and so can't be uninstalled).

My preference is option 2 but I fear we may have to go with option 1 if we can't be certain we aren't going to introduce a regression.

alexmurray commented 1 year ago

what is unprivileged unconfined? I am not too familiar with apparmor?

unconfined is the label apparmor gives to anything that doesn't have an explicit apparmor profile. unprivileged unconfined then just means anything that doesn't have an apparmor profile and which doesn't have CAP_SYS_ADMIN (ie. root)

ie. this restriction doesn't apply to anything that has CAP_SYS_ADMIN (ie. root), or anything with an explicit apparmor profile (but in that case the profile should declare the new userns, apparmor permission).

And how does all this relate to snaps?

It doesn't particularly relate to snaps any more than anything else - except that all snaps come with an apparmor profile - and so if a snap is using unprivileged user namespaces then it should either plug the new userns interface from snapd, OR make sure it gets this new userns, permission through some other interface (browser-support / docker-support etc).

In the case of the lxd snap, if needed we can add this new permission to the lxd-support interface in snapd - although from my local testing it doesn't appear to need it.

However, as identified earlier in this thread, some of the lxc client tools will likely need some additional changes to cope with this.

Is someone from the lxd team able to shed some light on my earlier question of how remote-viewer gets spawned via unshare -U here?

tomponline commented 1 year ago

Thanks for the extra info.

This is done in the snap package wrappers:

My understanding is so these commands can perform an unprivileged chroot: https://github.com/canonical/lxd-pkg-snap/commit/9521654975fdf3c8506615269fc4ed83cb2c8d8e

tomponline commented 1 year ago

Can we have the lxd package's unprivileged client commands run with an apparmor profile that has sufficient access?

simondeziel commented 1 year ago

@alexmurray Instead of having a system-wide profile for unshare, we could have LXD load an equivalent profile prior to calling unshare. This is what we do with other binaries like rsync IIRC.

This way, the system as a whole wouldn't have the flawed unshare profile allowing easy bypass.

tomponline commented 1 year ago

@simondeziel yeah I think lxd itself is going to be ok as its root anyway and can load apparmor profiles.

But for unprivileged lxc commands, would they be able to load their own apparmor profile, if that were possible it would make the security useless as any unprivileged process could do it right?

mihalicyn commented 1 year ago

First of all, AFAIK /proc/sys/kernel/unprivileged_userns_clone was always Debian/Ubuntu-specific thing. Upstream kernels behavior equivalent to unprivileged_userns_clone=1.

And yes, we rely on this thing in nesting support and if we want to continue supporting of nested container we will need to allow unprivileged userns creation with new AppArmor feature.

@alexmurray @jrjohansen is it possible somehow to detect if AppArmor supports unprivileged_userns_restriction or not and write a "universal" profile independent from AppArmor version? I have read through the docs and didn't find anything. Probably we need to add an AppArmor version checks to LXD and provide different profiles depending on this feature presence.

tomponline commented 1 year ago

@mihalicyn because LXD runs as root and we already apply our own apparmor profile to containers when we launch them this shouldnt be too much of an issue.

My main concern is around the unprivileged lxc * commands which are not launched from LXD and are not always run as root.

mihalicyn commented 1 year ago

because LXD runs as root and we already apply our own apparmor profile to containers when we launch them this shouldnt be too much of an issue.

LXD runs as root. But if you use nested container then a user namespace of a nested container have to be created from an unprivileged user. That's the issue.

simondeziel commented 1 year ago

My main concern is around the unprivileged lxc * commands which are not launched from LXD and are not always run as root.

The lxc command itself is apparently running under an Apparmor profile that it escapes according to https://github.com/canonical/lxd-pkg-snap/blob/latest-edge/snapcraft/commands/lxc#L5.

Maybe we could stop escaping it?

tomponline commented 1 year ago

LXD runs as root. But if you use nested container then a user namespace of a nested container have to be created from an unprivileged user. That's the issue.

Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?

tomponline commented 1 year ago

Maybe we could stop escaping it?

Not without knowing why it does that - which I don't.

Also, does that mean that when snap invokes the lxc tools they are actually run as root, or is it possible for unprivileged processes to escape apparmor confinement (seems unlikely).

mihalicyn commented 1 year ago

Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?

yes, of course. But our current AppArmor profile does not allow this because we are using an old AppArmor version and it does not support a new syntax. It means that if apparmor_restrict_unprivileged_userns=1 then nested containers functionality will be broken.

simondeziel commented 1 year ago

Apparently, the aa-exec escape was intentional:

$ head -n 603 /var/lib/snapd/apparmor/profiles/snap.lxd.lxc | tail -n 5
# Description: Can change to any apparmor profile (including unconfined) thus
# giving access to all resources of the system so LXD may manage what to give
# to its containers. This gives device ownership to connected snaps.
@{PROC}/**/attr/{,apparmor/}current r,
/{,usr/}{,s}bin/aa-exec ux,

Instead of escaping to unconfined we could switch to a profile allowing the handful of commands (unshare, etc) to use userns. But yeah, that will need some more understanding around that.

tomponline commented 1 year ago

Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?

yes, of course. But our current AppArmor profile does not allow this because we are using an old AppArmor version and it does not support a new syntax. It means that if apparmor_restrict_unprivileged_userns=1 then nested containers functionality will be broken.

Yes we would need to use the later version of apparmor tooling inside the snap.

mihalicyn commented 1 year ago

Yes we would need to use the later version of apparmor tooling inside the snap.

yes, I thought about this option previously when I was looking into fix for CVE 2016-1585. But unfortunately AppArmor dependencies is a big problem. Probably we will need to ship dependencies too and I'm not sure that it's a good way.

tomponline commented 1 year ago

Right now it looks like we depend on the apparmor in core22

mihalicyn commented 1 year ago

Right now it looks like we depend on the apparmor in core22

Of course. AppArmor was a dependency too in core20. I wanted to say that you can't just take a fresh AppArmor sources and build them on the old distro, because you will meet a lot of incompatibilities with cpython library, and all stuff around it (https://packages.ubuntu.com/jammy/apparmor-utils).

tomponline commented 1 year ago

Our plan is to switch to core 22 for the 5.0 LTS series so if updated apparmor could be back ported by ubuntu team that would help.

mihalicyn commented 1 year ago

Our plan is to switch to core 22 for the 5.0 LTS series so if updated apparmor could be back ported by ubuntu team that would help.

yes, that would be ideal. But unfortunately even fix for CVE 2016-1585 is still not backported to Ubuntu 22.04 ;-)

tomponline commented 1 year ago

@alexmurray what is your thinking on how snaps using say core20 or core22 would be able to access an updated apparmor in order to use the new userns permission (for processes that it itself launches as unprivileged)?

jrjohansen commented 1 year ago

@alexmurray @jrjohansen is it possible somehow to detect if AppArmor supports unprivileged_userns_restriction or not and write a "universal" profile independent from AppArmor version? I have read through the docs and didn't find anything. Probably we need to add an AppArmor version checks to LXD and provide different profiles depending on this feature presence.

It depends on the iteration of the restriction. In Ubuntu the value of the sysctl can be used.

cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns
0

In the newer code that will be upstreamed there is query value within apparmorfs as well

jrjohansen commented 1 year ago

Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?

Likely, it will depend on how the profiles are setup. Profile transitions and such can happen.

jrjohansen commented 1 year ago

Not without knowing why it does that - which I don't.

Also, does that mean that when snap invokes the lxc tools they are actually run as root, or is it possible for unprivileged processes to escape apparmor confinement (seems unlikely).

If the profile allows it, yes you can. And the lxd profile allows it.

tomponline commented 1 year ago

So what needs to change then? I'm afraid I'm not following this now.

jrjohansen commented 1 year ago

Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?

yes, of course. But our current AppArmor profile does not allow this because we are using an old AppArmor version and it does not support a new syntax. It means that if apparmor_restrict_unprivileged_userns=1 then nested containers functionality will be broken.

Actually it should work. AppArmor currently is respecting the ABI around policy versions, so a profile that was authored without support for the userns feature will still work. This is an escape/bypass but respecting ABI is required for using policy in nested containers, eg. a 20.04 container in 22.04. AppArmor does have a toggle that breaks the ABI and forces userns to be used but this is not enabled by default.

jrjohansen commented 1 year ago

Right now it looks like we depend on the apparmor in core22

Of course. AppArmor was a dependency too in core20. I wanted to say that you can't just take a fresh AppArmor sources and build them on the old distro, because you will meet a lot of incompatibilities with cpython library, and all stuff around it (https://packages.ubuntu.com/jammy/apparmor-utils).

the python tools are not required, and are not part of apparmor core. The binary have a much smaller dependency list, you should be able to get away with just the apparmor and libapparmor packages.

jrjohansen commented 1 year ago

Our plan is to switch to core 22 for the 5.0 LTS series so if updated apparmor could be back ported by ubuntu team that would help.

yes, that would be ideal. But unfortunately even fix for CVE 2016-1585 is still not backported to Ubuntu 22.04 ;-)

I'll poke ESM team again.

jrjohansen commented 1 year ago

@alexmurray what is your thinking on how snaps using say core20 or core22 would be able to access an updated apparmor in order to use the new userns permission (for processes that it itself launches as unprivileged)?

personally, I am in favor of it, but either we need to SRU to 20.04 and 22.04 or vendor apparmor in core20 / core22. The SRU is just work, and so far it hasn't been required. If needed we can look into it, but its not just me that determines SRU so I can't make promises. I am not particularly fond of vendoring, but it is possible. @alexmurray did the work and snapd is now vendoring apparmor, we could do this with core or even lxd. The question becomes whether that is a better solution than doing an SRU.

jrjohansen commented 1 year ago

So what needs to change then? I'm afraid I'm not following this now.

I can't remember the specifics of the profile atm, but definitely allowed escaping. What needs to be done is work to figure out why it needs to escape and then determine what the best way for dealing with that is. Do we fold in permissions, break it into multiple profiles for some form of priv-sep etc. I can tell you it will probably take several iterations, and there might not be a simple way to achieve some of it.

tomponline commented 1 year ago

OK thanks for clarifying.

What are the next steps to avoid breakages in 23.10 for lxd?

jrjohansen commented 1 year ago

OK thanks for clarifying.

What are the next steps to avoid breakages in 23.10 for lxd?

Gather data via testing. Find the cases that break and then we need to look at each one and figure out what is the best thing to do in the limited time we have.

My guess is we are going to take a half step and create a "special unconfined" profile that mostly acts as unconfined. And just replace the escapes to unconfined with it, and then work towards better confinement in the future. It will look something like

profile lxd_unconfined (unconfined) {
  allow userns,
}
tomponline commented 1 year ago

OK thanks, so we are clear , is this something you guys are going to take on initially or do we need to redirect resources onto this from other projects? Thanks

jrjohansen commented 1 year ago

OK thanks, so we are clear , is this something you guys are going to take on initially or do we need to redirect resources onto this from other projects? Thanks

If you are going to be waiting on us to do it, its going to take a long time. That being said I will be as responsive as I can and promise to prioritize helping with this as much as I can, and I know @alexmurray will do what he can too. At the very least we need people who use lxd regularly and know the tooling well to report what is breaking, along with dmesg errors, so we can work together on the fixes. I think if we have that kind of support we can probably do the apparmor profile work, but will need help testing and getting it rolled into lxd.

If we are talking SRU/vendoring work that is I think going to have to fall on the security team, and once we have a course me and @alexmurray will have to work with management on priority etc.

tomponline commented 1 year ago

Hrm OK, I think we need a meeting to discuss this further. I'll setup up something urgently.

Is it possible to test what will break before 23.10 is released?

jrjohansen commented 1 year ago

Hrm OK, I think we need a meeting to discuss this further. I'll setup up something urgently.

okay, let try and invite my management, and @alexmurray as well. Timing for getting that set together requires some twilight zone level on time twisting. I will flex as much as I can (ie. don't worry about my tz).

Is it possible to test what will break before 23.10 is released?

yes. You should be able to run and test today. The feature exists in both the lunar and mantic kernels, but it may need to be turned on by the sysctl, depending on your kernel and apparmor versions.

alexmurray commented 1 year ago

So I may not have all the details right in my head but how about the following:

  1. We teach snapd to detect and support the new apparmor unconfined profile mode
  2. We update the lxd-support interface in snapd to use this new mode when it is supported
  3. We also add in support for the userns, apparmor permission to the lxd-support interface when it is available
  4. lxd-pkg-snap gets updated so that it doesn't try to escape confinement to unconfined if it is already in this new lxd_unconfined mode
tomponline commented 1 year ago

yes. You should be able to run and test today. The feature exists in both the lunar and mantic kernels, but it may need to be turned on by the sysctl, depending on your kernel and apparmor versions.

OK great, so we need to set /proc/sys/kernel/apparmor_restrict_unprivileged_userns to 1 in Mantic and then test LXD to see what breaks as a first step.

@simondeziel is this something you have capacity for?

Then we can report back here with our findings and collected dmesg output for each problem.

And then maybe we can have a meeting to discuss resolving the individual issues.