canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 930 forks source link

Running snap inside lxd container on impish fails #9642

Closed woutervb closed 2 years ago

woutervb commented 2 years ago

Required information

Issue description

Installing the snap-store-proxy snap inside the container results in only 2 services running, while it should be that only 2 don't run. Container can be either Bionic or Focal

Steps to reproduce

  1. lxc launch ubunu:20.04 test
  2. lxc exec test -- bash
  3. snap install snap-store-proxy
  4. snap-store-proxy status

Information to attach

stgraber commented 2 years ago

I suspect the issue is that your container runs Ubuntu 20.04 and likely expects a cgroup1 layout, your host system is running impish which comes with cgroup2, this then results in a fair bit of confusion with snapd and that issue.

Can you try booting your host system with systemd.unified_cgroup_hierarchy=false passed to the kernel command line?

stgraber commented 2 years ago

Ah, the related forum thread mentions something about network sockets, it'd be good to have the dmesg output for those.

woutervb commented 2 years ago

Hi, I cannot give you the dmesg, but I have a pastebin that might prove useful https://pastebin.canonical.com/p/pDzRRpwVT6/

Currently back on Focal, as this problem was blocking me.

The other info I can provide, is that running the command snap-proxy status inside the container gave the following output:

WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
Store ID: not registered
Internal Service Status:
  memcached: running
  nginx: running
  snapauth: not running: 500 Server Error: INTERNAL SERVER ERROR for url: http://127.0.0.1:8005/_status/check
  snapdevicegw: not running: [Errno 111] Connection refused
  snapdevicegw-local: not running: [Errno 111] Connection refused
  snapproxy: not running: [Errno 111] Connection refused
  snaprevs: not running: 500 Server Error: INTERNAL SERVER ERROR for url: http://127.0.0.1:8002/_status/check

Which does indeed give me the indication that something cgroup related is going on.

In the attached pastebin, there are lines like: 2021-12-02T12:07:49Z snap-store-proxy.snapdevicegw[18446]: 2021-12-02 12:07:49.935Z ERROR gunicorn.error "Can't connect to /var/snap/snap-store-proxy/78/snapdevicegw/snapdevicegw.sock"

Which does point to socket files that don't work, which is basically the reason for things failing as far as I can find.

stgraber commented 2 years ago
WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement

Is what I suspected with running snaps in a 20.04 container on a 21.10 host, if that's the source of the issue, then there's nothing we can do as that's a snapd deficiency (which hopefully can be fixed in their 20.04 build).

stgraber commented 2 years ago

I've reproduced the issue here and looking at the kernel log, I'm seeing things like:

[  264.905113] audit: type=1400 audit(1638928972.864:240): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4877 comm="python3" capability=0  capname="chown"
[  265.419787] audit: type=1400 audit(1638928973.380:241): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4903 comm="python3" capability=0  capname="chown"
[  267.642834] audit: type=1400 audit(1638928975.600:242): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4924 comm="python3" capability=0  capname="chown"
[  269.820697] audit: type=1400 audit(1638928977.780:243): apparmor="DENIED" operation="mknod" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" name="/dev/shm/RMtCRJ" pid=4948 comm="python3" requested_mask="c" denied_mask="c" fsuid=1000000 ouid=1000000
[  269.917270] audit: type=1400 audit(1638928977.876:244): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4948 comm="python3" capability=0  capname="chown"
[  270.647676] audit: type=1400 audit(1638928978.608:245): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4924 comm="python3" capability=0  capname="chown"
[  270.919389] audit: type=1400 audit(1638928978.880:246): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4948 comm="python3" capability=0  capname="chown"
[  271.649725] audit: type=1400 audit(1638928979.612:247): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4924 comm="python3" capability=0  capname="chown"
[  271.921854] audit: type=1400 audit(1638928979.884:248): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4948 comm="python3" capability=0  capname="chown"
[  274.386962] audit: type=1400 audit(1638928982.344:249): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4968 comm="python3" capability=0  capname="chown"
[  276.320810] audit: type=1400 audit(1638928984.280:250): apparmor="DENIED" operation="mknod" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" name="/dev/shm/HYIIf1" pid=4992 comm="python3" requested_mask="c" denied_mask="c" fsuid=1000000 ouid=1000000
[  276.417149] audit: type=1400 audit(1638928984.376:251): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4992 comm="python3" capability=0  capname="chown"
[  277.391819] audit: type=1400 audit(1638928985.352:252): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4968 comm="python3" capability=0  capname="chown"
[  277.418795] audit: type=1400 audit(1638928985.380:253): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4992 comm="python3" capability=0  capname="chown"
[  278.393627] audit: type=1400 audit(1638928986.356:254): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=4968 comm="python3" capability=0  capname="chown"
[  278.421197] audit: type=1400 audit(1638928986.380:255): apparmor="DENIED" operation="capable" namespace="root//lxd-test_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=4992 comm="python3" capability=0  capname="chown"

All of those are for snapd generated apparmor profiles and we're indeed seeing a lot of failures in there.

Just for completeness, I've also installed snap-store-proxy directly on the 21.10 system, it's looking moderately happier but still won't work:

root@impish:~# snap-store-proxy status
Store ID: not registered
Internal Service Status:
  memcached: running
  nginx: running
  snapauth: not running: 500 Server Error: INTERNAL SERVER ERROR for url: http://127.0.0.1:8005/_status/check
  snapdevicegw: running
  snapdevicegw-local: running
  snapproxy: running
  snaprevs: not running: 500 Server Error: INTERNAL SERVER ERROR for url: http://127.0.0.1:8002/_status/check

So as this issue persists with LXD completely removed from the equation, I'd strongly recommend you file a bug against snap-store-proxy and/or snapd to have this looked at and resolved.

woutervb commented 2 years ago

That it doesn't work is as it is not configured / registered, but you got at the state that is expected. Will open a case with snapd and see what they can do.

stgraber commented 2 years ago

Hmm, it was still spewing a lot of DENIED in dmesg even when run outside of a container, so there's something a bit odd going on with that snap. I also suspect that the snapd team hasn't been very actively testing snapd inside of a pre-cgroup2 container on a cgroup2 host as that's quite a rare setup at this stage.

bboozzoo commented 2 years ago

I left a note in LP https://bugs.launchpad.net/lxd/+bug/1953563/comments/1 but it does not appear to be related to cgroups v2. I tried some smaller snaps, all behaved correctly. I have launched a couple of configurations (21.10 on 21.10, 20.04 on 21.10, 20.04 on 20.04). Indeed there appears to be a problem with 21.10 as a host which is observed with a nested instance of 21.10 and 20.04. However, disabling apparmor in lxd makes the problems go away (lxc config set ... lxc.raw 'lxc.apparmor.profile=unconfined', the container has to be made privileged at this point too), this applies to both setups with 21.10 as the host. I have a hunch that the problem is with AppArmor 3 which is new compared to 20.04 (although it was introduced in 21.04).

woutervb commented 2 years ago

@bboozzoo, @stgraber can either of you contact the snapd team directly? As that ticket I opened on lp now bounces me back to you.

bboozzoo commented 2 years ago

@woutervb I am on the snapd team, anyways, I see that @stgraber has already identified a potential problem in VFS idmapping and the bug has been reassigned to the kernel for further investigation.

woutervb commented 2 years ago

This can be closed as it is a kernel problem