Open ganto opened 9 years ago
Hi everyone
I found the answer of the LXC mounting error in Re: [systemd-devel] logind vs CAP_SYS_ADMIN-lessness. There is a mount option create=dir
.
With the follwoing additional entries in /var/lib/lxc/jessie01/config
, it's possible to boot a Jessie systemd container without 'cap_sys_admin':
# Custom container options
lxc.mount.auto = cgroup:mixed
lxc.mount.entry = tmpfs dev/shm tmpfs rw,nosuid,nodev,create=dir 0 0
lxc.mount.entry = tmpfs run tmpfs rw,nosuid,nodev,mode=755,create=dir 0 0
lxc.mount.entry = tmpfs run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,create=dir 0 0
lxc.mount.entry = debugfs sys/kernel/debug debugfs rw,relatime 0 0
lxc.mount.entry = mqueue dev/mqueue mqueue rw,relatime,create=dir 0 0
lxc.mount.entry = hugetlbfs dev/hugepages hugetlbfs rw,relatime,create=dir 0 0
Also make sure, that you have the following line in your /etc/lxc/lxc.conf
:
lxc.cgroup.use = @all
Otherwise the container start will fail with the following error:
# lxc-start -n jessie01
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
After merging #16 setting up a jessie container on a jessie LXC host should now work out of the box.
Looks like this issue can be closed?
Well, nobody else raised any issues, so I guess this can be closed. :-)
Sorry to re-open this, but the issue came back with linux kernel 4.6. None of the workarounds except for "Don't drop 'cap_sys_admin' in your container" works. Reverting to kernel 4.5, everything works as expected. This might have to do with the addition of cgroup namespace support in the kernel, see this (and consecutive) pull request: http://lkml.iu.edu/hypermail/linux/kernel/1603.2/02432.html
Do you guys know any workaround here?
@kartoffelheinz Unfortunately I haven't heard yet anything about this issue. If you find a solution, it would be great to hear it. Thanks for the heads up, I reopened the issue in case anybody else is interested.
hi, if can help you... from kernel >=4.6 cgroup api/features are been rewrited. As describe on gentoo wiki https://wiki.gentoo.org/wiki/LXC#Configuring_unprivileged_LXC to start unprivileged container is needed mount cgroup filesystem with systemd name.
root #mkdir -p /sys/fs/cgroup/systemd
root #mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd
I tested this with kernel 4.8 and 4.9. This solution use cgroup v1 api, currently I don't know how use correctly cgroup v2 api with unprivileged containers.
@geaaru method worked for me, if you don't use systemd in the host you can add these lines to fstab
cgroup /sys/fs/cgroup cgroup defaults 0 0
systemd /sys/fs/cgroup/systemd cgroup name=systemd,x-mount.mkdir=0555 0 0
perhaps I'm stil unable to mount with name=systemd option
This issue is still a major PITA.
As of now, it is impossible to run privileged containers without sys_admin capability in latest Debian stable using the 4.9 Kernel with systemd present in both host and guest. System will not load and you can see the following errors in console / logfile.
Freezing execution. Failed to mount tmpfs at /sys/fs/cgroup: Operation not permitted Failed to mount cgroup at /sys/fs/cgroup/systemd: No such file or directory [ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems, freezing.
None of the workarounds (adding cap_sys is not a workaround anybody should consider) change that, the only way to make it work is to use the old Debian Jessie 3.16 Kernel.
Note in case someone else runs into this.
Just updated lxc host to debian 11/bullseye and had some issues with old containers (config not managed by debops). I only had to add the following lines to each of the node's /var/lib/lxc/
# needed for drop_cap sys_admin
lxc.mount.entry = tmpfs dev/shm tmpfs rw,nosuid,nodev,create=dir 0 0
lxc.mount.entry = tmpfs run tmpfs rw,nosuid,nodev,mode=755,create=dir 0 0
lxc.mount.entry = tmpfs run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,create=dir 0 0
symptom was
$ lxc-start --foreground --logpriority debug --name container1
Failed to mount tmpfs at /dev/shm: Operation not permitted
Failed to mount tmpfs at /run: Operation not permitted
Failed to mount tmpfs at /run/lock: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...
Altough this was already discussed in IRC I allow myself to open an issue to track the problem and progress with this issue.
Starting position Create Debian Jessie container on a Jessie LXC host with debops:
This will install systemd by default.
Error When trying to start the container, the following error appears:
Reason 'cap_sys_admin' is dropped in
/var/lib/lxc/jessie01/config
as defined in defaults/main.yml and therefore preventssystemd
to mount some required file systems:Known Work-Arounds
lxc.autodev = 1
andlxc.kmesg = 0
must be removed from the container configuration to make this work.systemd
to fully work without further configuration. NOTE: This has a huge negative security impact.Unsuccessful Work-Around I also tried to drop 'cap_sys_admin' and make LXC mount the required file systems without
systemd
involvement. For this I added:Unfortunately this fails with the message that
/run/lock
doesn't exist:Bugs
journald
to forward messages to syslog in case 'cap_sys_admin' is dropped. This is only fixed insystemd_218-4
in experimental now.As I could live with the mentioned systemd bug, I'm still trying to find a way to run it without 'cap_sys_admin'. The challenges then are:
/run/lock
before actually mounting it?systemd
to not mount a separate file system for/run/lock
?If there are some other possible work-arounds or any hints regarding my open questions, please let me know. I'll update once I found out more