Open simondeziel opened 1 year ago
userns=0
In this context what is userns=0
btw?
@tomponline I meant unprivileged_userns_clone=0
, edited. Thanks.
With the upcoming plan to restrict unprivileged userns in Ubuntu 23.10, lxd may need to grow some additional support around its handling of unprivileged user namespaces - see https://discourse.ubuntu.com/t/spec-unprivileged-user-namespace-restrictions-via-apparmor-in-ubuntu-23-10/37626 for details
Similarly to above, when this new restriction is enabled in Ubuntu 23.10 lxc console --type vga
fails - I notice this ends up spawning unshare -U
to run remote-viewer
(and unshare -U
fails) but I can't seem to find the code path that is running remote-viewer
via unshare
- can anyone shed any light on how this exec path ends up happening?
Seen in snap run --strace lxd.lxd console --type vga ...
:
[pid 413939] execve("/usr/bin/unshare", ["unshare", "-U", "-r", "chroot", "/var/lib/snapd/hostfs/", "remote-viewer", "spice+unix:///home/amurray/snap/"...], 0x563d48d9c818 /* 102 vars */) = 0
...
[pid 413939] unshare(CLONE_NEWUSER) = -1 EACCES (Permission denied)
Interestingly, unshare
is not running under any apparmor confinement when it is spawned
[84241.526723] audit: type=1400 audit(1693372708.093:6239): apparmor="DENIED" operation="userns_create" class="namespace" info="User namespace creation restricted" error=-13 profile="unconfined" pid=413939 comm="unshare" requested="userns_create" denied="userns_create"
so it would be possible to "fix" this by creating an AppArmor profile on the host that allowed unshare
to user userns in /etc/apparmor.d/usr.bin.unshare
:
abi <abi/4.0>,
include <tunables/global>
/usr/bin/unshare flags=(unconfined) {
userns,
# Site-specific additions and overrides. See local/README for details.
include if exists <local/usr.bin.unshare>
}
but this then allows any exploit which wants to use an unprivileged user namespace to just run itself via unshare -U
and so kind of defeats the purpose of this restriction.
cc @jrjohansen - I assume you agree that it isn't feasible to have a generic profile for unshare
as above? It would make a lot of things easier (and reduce the risk that we regress every local users random scripts that run things via unshare -U
) - or any other ideas here?
@alexmurray you are correct, having a generic unshare profile provides an easy by-pass of the restriction is not advisable.
@alexmurray so are you saying that going forward all calls to unshare -U
need to be wrapped in their own apparmor profile?
lxd may need to grow some additional support around its handling of unprivileged user namespaces
LXD the daemon runs as root and then launches unprivileged processes with their own apparmor profiles so hopefully only the client tools will be affected by this. Does that sound OK?
Although is this going to break security.nesting=true?
@simondeziel do you have any capacity to help evaluate the impact of this?
@tomponline the restriction is specifically on unprivileged unconfined. So an unconfined process with cap_sysadmin will be able to create a new user ns via clone or unshare. It will also be able to call unshare as long as it doesn't drop privs before doing so.
The attack we are trying to mitigate against is a privilege escalation where, an attack uses an unprivileged user namespace to gain pseudo root, with access to root restricted kernel interfaces. It can then use this access to attempt an exploit that is not possible by a regular user.
what is unprivileged unconfined
? I am not too familiar with apparmor?
And how does all this relate to snaps?
@alexmurray so are you saying that going forward all calls to
unshare -U
need to be wrapped in their own apparmor profile?
We have 2 options -
/usr/bin/unshare
itself - but then this allows anything to use unshare -U
to gain access to an unprivileged user namespace and hence circumvent this restriction. Or/usr/bin/unshare
or directly calls the unshare()/setns()
system calls.The second option is a better outcome from a security perspective but harder as we have to try and identify all of these things - whilst the first is easier but allows this restriction to be trivially bypassed (especially since /usr/bin/unshare
is shipped in util-linux
which is marked as essential
by apt and so can't be uninstalled).
My preference is option 2 but I fear we may have to go with option 1 if we can't be certain we aren't going to introduce a regression.
what is
unprivileged unconfined
? I am not too familiar with apparmor?
unconfined
is the label apparmor gives to anything that doesn't have an explicit apparmor profile. unprivileged unconfined
then just means anything that doesn't have an apparmor profile and which doesn't have CAP_SYS_ADMIN
(ie. root)
ie. this restriction doesn't apply to anything that has CAP_SYS_ADMIN
(ie. root), or anything with an explicit apparmor profile (but in that case the profile should declare the new userns,
apparmor permission).
And how does all this relate to snaps?
It doesn't particularly relate to snaps any more than anything else - except that all snaps come with an apparmor profile - and so if a snap is using unprivileged user namespaces then it should either plug the new userns
interface from snapd, OR make sure it gets this new userns,
permission through some other interface (browser-support
/ docker-support
etc).
In the case of the lxd snap, if needed we can add this new permission to the lxd-support
interface in snapd - although from my local testing it doesn't appear to need it.
However, as identified earlier in this thread, some of the lxc
client tools will likely need some additional changes to cope with this.
Is someone from the lxd team able to shed some light on my earlier question of how remote-viewer
gets spawned via unshare -U
here?
Thanks for the extra info.
This is done in the snap package wrappers:
My understanding is so these commands can perform an unprivileged chroot: https://github.com/canonical/lxd-pkg-snap/commit/9521654975fdf3c8506615269fc4ed83cb2c8d8e
Can we have the lxd package's unprivileged client commands run with an apparmor profile that has sufficient access?
@alexmurray Instead of having a system-wide profile for unshare
, we could have LXD load an equivalent profile prior to calling unshare
. This is what we do with other binaries like rsync
IIRC.
This way, the system as a whole wouldn't have the flawed unshare
profile allowing easy bypass.
@simondeziel yeah I think lxd itself is going to be ok as its root anyway and can load apparmor profiles.
But for unprivileged lxc
commands, would they be able to load their own apparmor profile, if that were possible it would make the security useless as any unprivileged process could do it right?
First of all, AFAIK /proc/sys/kernel/unprivileged_userns_clone
was always Debian/Ubuntu-specific thing. Upstream kernels behavior equivalent to unprivileged_userns_clone=1
.
And yes, we rely on this thing in nesting support and if we want to continue supporting of nested container we will need to allow unprivileged userns creation with new AppArmor feature.
@alexmurray @jrjohansen is it possible somehow to detect if AppArmor supports unprivileged_userns_restriction or not and write a "universal" profile independent from AppArmor version? I have read through the docs and didn't find anything. Probably we need to add an AppArmor version checks to LXD and provide different profiles depending on this feature presence.
@mihalicyn because LXD runs as root and we already apply our own apparmor profile to containers when we launch them this shouldnt be too much of an issue.
My main concern is around the unprivileged lxc *
commands which are not launched from LXD and are not always run as root.
because LXD runs as root and we already apply our own apparmor profile to containers when we launch them this shouldnt be too much of an issue.
LXD runs as root. But if you use nested container then a user namespace of a nested container have to be created from an unprivileged user. That's the issue.
My main concern is around the unprivileged
lxc *
commands which are not launched from LXD and are not always run as root.
The lxc
command itself is apparently running under an Apparmor profile that it escapes according to https://github.com/canonical/lxd-pkg-snap/blob/latest-edge/snapcraft/commands/lxc#L5.
Maybe we could stop escaping it?
LXD runs as root. But if you use nested container then a user namespace of a nested container have to be created from an unprivileged user. That's the issue.
Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?
Maybe we could stop escaping it?
Not without knowing why it does that - which I don't.
Also, does that mean that when snap invokes the lxc tools they are actually run as root, or is it possible for unprivileged processes to escape apparmor confinement (seems unlikely).
Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?
yes, of course. But our current AppArmor profile does not allow this because we are using an old AppArmor version and it does not support a new syntax. It means that if apparmor_restrict_unprivileged_userns=1
then nested containers functionality will be broken.
Apparently, the aa-exec
escape was intentional:
$ head -n 603 /var/lib/snapd/apparmor/profiles/snap.lxd.lxc | tail -n 5
# Description: Can change to any apparmor profile (including unconfined) thus
# giving access to all resources of the system so LXD may manage what to give
# to its containers. This gives device ownership to connected snaps.
@{PROC}/**/attr/{,apparmor/}current r,
/{,usr/}{,s}bin/aa-exec ux,
Instead of escaping to unconfined
we could switch to a profile allowing the handful of commands (unshare
, etc) to use userns
. But yeah, that will need some more understanding around that.
Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?
yes, of course. But our current AppArmor profile does not allow this because we are using an old AppArmor version and it does not support a new syntax. It means that if
apparmor_restrict_unprivileged_userns=1
then nested containers functionality will be broken.
Yes we would need to use the later version of apparmor tooling inside the snap.
Yes we would need to use the later version of apparmor tooling inside the snap.
yes, I thought about this option previously when I was looking into fix for CVE 2016-1585. But unfortunately AppArmor dependencies is a big problem. Probably we will need to ship dependencies too and I'm not sure that it's a good way.
Right now it looks like we depend on the apparmor in core22
Right now it looks like we depend on the apparmor in core22
Of course. AppArmor was a dependency too in core20. I wanted to say that you can't just take a fresh AppArmor sources and build them on the old distro, because you will meet a lot of incompatibilities with cpython library, and all stuff around it (https://packages.ubuntu.com/jammy/apparmor-utils).
Our plan is to switch to core 22 for the 5.0 LTS series so if updated apparmor could be back ported by ubuntu team that would help.
Our plan is to switch to core 22 for the 5.0 LTS series so if updated apparmor could be back ported by ubuntu team that would help.
yes, that would be ideal. But unfortunately even fix for CVE 2016-1585 is still not backported to Ubuntu 22.04 ;-)
@alexmurray what is your thinking on how snaps using say core20 or core22 would be able to access an updated apparmor in order to use the new userns
permission (for processes that it itself launches as unprivileged)?
@alexmurray @jrjohansen is it possible somehow to detect if AppArmor supports unprivileged_userns_restriction or not and write a "universal" profile independent from AppArmor version? I have read through the docs and didn't find anything. Probably we need to add an AppArmor version checks to LXD and provide different profiles depending on this feature presence.
It depends on the iteration of the restriction. In Ubuntu the value of the sysctl can be used.
cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns
0
In the newer code that will be upstreamed there is query value within apparmorfs as well
Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?
Likely, it will depend on how the profiles are setup. Profile transitions and such can happen.
Not without knowing why it does that - which I don't.
Also, does that mean that when snap invokes the lxc tools they are actually run as root, or is it possible for unprivileged processes to escape apparmor confinement (seems unlikely).
If the profile allows it, yes you can. And the lxd profile allows it.
So what needs to change then? I'm afraid I'm not following this now.
Yes but if the init process of the container has been started with an apparmor profile that allows unprivileged namespace creation then the sub-processes should be able to do it right?
yes, of course. But our current AppArmor profile does not allow this because we are using an old AppArmor version and it does not support a new syntax. It means that if
apparmor_restrict_unprivileged_userns=1
then nested containers functionality will be broken.
Actually it should work. AppArmor currently is respecting the ABI around policy versions, so a profile that was authored without support for the userns feature will still work. This is an escape/bypass but respecting ABI is required for using policy in nested containers, eg. a 20.04 container in 22.04. AppArmor does have a toggle that breaks the ABI and forces userns to be used but this is not enabled by default.
Right now it looks like we depend on the apparmor in core22
Of course. AppArmor was a dependency too in core20. I wanted to say that you can't just take a fresh AppArmor sources and build them on the old distro, because you will meet a lot of incompatibilities with cpython library, and all stuff around it (https://packages.ubuntu.com/jammy/apparmor-utils).
the python tools are not required, and are not part of apparmor core. The binary have a much smaller dependency list, you should be able to get away with just the apparmor and libapparmor packages.
Our plan is to switch to core 22 for the 5.0 LTS series so if updated apparmor could be back ported by ubuntu team that would help.
yes, that would be ideal. But unfortunately even fix for CVE 2016-1585 is still not backported to Ubuntu 22.04 ;-)
I'll poke ESM team again.
@alexmurray what is your thinking on how snaps using say core20 or core22 would be able to access an updated apparmor in order to use the new
userns
permission (for processes that it itself launches as unprivileged)?
personally, I am in favor of it, but either we need to SRU to 20.04 and 22.04 or vendor apparmor in core20 / core22. The SRU is just work, and so far it hasn't been required. If needed we can look into it, but its not just me that determines SRU so I can't make promises. I am not particularly fond of vendoring, but it is possible. @alexmurray did the work and snapd is now vendoring apparmor, we could do this with core or even lxd. The question becomes whether that is a better solution than doing an SRU.
So what needs to change then? I'm afraid I'm not following this now.
I can't remember the specifics of the profile atm, but definitely allowed escaping. What needs to be done is work to figure out why it needs to escape and then determine what the best way for dealing with that is. Do we fold in permissions, break it into multiple profiles for some form of priv-sep etc. I can tell you it will probably take several iterations, and there might not be a simple way to achieve some of it.
OK thanks for clarifying.
What are the next steps to avoid breakages in 23.10 for lxd?
OK thanks for clarifying.
What are the next steps to avoid breakages in 23.10 for lxd?
Gather data via testing. Find the cases that break and then we need to look at each one and figure out what is the best thing to do in the limited time we have.
My guess is we are going to take a half step and create a "special unconfined" profile that mostly acts as unconfined. And just replace the escapes to unconfined with it, and then work towards better confinement in the future. It will look something like
profile lxd_unconfined (unconfined) {
allow userns,
}
OK thanks, so we are clear , is this something you guys are going to take on initially or do we need to redirect resources onto this from other projects? Thanks
OK thanks, so we are clear , is this something you guys are going to take on initially or do we need to redirect resources onto this from other projects? Thanks
If you are going to be waiting on us to do it, its going to take a long time. That being said I will be as responsive as I can and promise to prioritize helping with this as much as I can, and I know @alexmurray will do what he can too. At the very least we need people who use lxd regularly and know the tooling well to report what is breaking, along with dmesg errors, so we can work together on the fixes. I think if we have that kind of support we can probably do the apparmor profile work, but will need help testing and getting it rolled into lxd.
If we are talking SRU/vendoring work that is I think going to have to fall on the security team, and once we have a course me and @alexmurray will have to work with management on priority etc.
Hrm OK, I think we need a meeting to discuss this further. I'll setup up something urgently.
Is it possible to test what will break before 23.10 is released?
Hrm OK, I think we need a meeting to discuss this further. I'll setup up something urgently.
okay, let try and invite my management, and @alexmurray as well. Timing for getting that set together requires some twilight zone level on time twisting. I will flex as much as I can (ie. don't worry about my tz).
Is it possible to test what will break before 23.10 is released?
yes. You should be able to run and test today. The feature exists in both the lunar and mantic kernels, but it may need to be turned on by the sysctl, depending on your kernel and apparmor versions.
So I may not have all the details right in my head but how about the following:
userns,
apparmor permission to the lxd-support interface when it is availableunconfined
if it is already in this new lxd_unconfined
modeyes. You should be able to run and test today. The feature exists in both the lunar and mantic kernels, but it may need to be turned on by the sysctl, depending on your kernel and apparmor versions.
OK great, so we need to set /proc/sys/kernel/apparmor_restrict_unprivileged_userns
to 1
in Mantic and then test LXD to see what breaks as a first step.
@simondeziel is this something you have capacity for?
Then we can report back here with our findings and collected dmesg output for each problem.
And then maybe we can have a meeting to discuss resolving the individual issues.
ATM, the LXD snap unconditionally enables unprivileged user namespaces (
echo 1 > /proc/sys/kernel/unprivileged_userns_clone
) because some features depend on this:lxc config edit
using an externalEDITOR
/VISUAL
lxc console --type vga
raw.idmap
security.nesting=true
this requires rootlxc file mount
?However, unprivileged user namespaces come with some risks* and are not strictly required if you trade some usablility to favor security. I've been using the LXD snap with userns disabled for ~2 months and found it to be a reasonable experience. Here are the workarounds used to overcome some of the issues stemming from
userns=0
:lxc config edit
:lxc console --type vga
:Install either of those and they will automatically start. Alternatively you may use another SPICE client using the following URI: spice+unix:///home/sdeziel/snap/lxd/common/config/sockets/137487013.spice
Your system has unprivileged user namespaces disabled, as a result, you will need to manually run one a SPICE client with the URI above.
Start
spicy
manually using the URI provided bylxc console --type vga
spicy --uri=spice+unix:///home/sdeziel/snap/lxd/common/config/sockets/137487013.spice