Open resdigita opened 4 years ago
Can you try the latest lxcfs from the ganto/lxc4 COPR. As I said before, I assumed this to be an issue with lxcfs that seems to be fixed now. I cannot reproduce the error:
[ganto@host ~]$ lsb_release -d
Description: Fedora release 31 (Thirty One)
[ganto@host ~]$ rpm -q lxd
lxd-3.22-0.1.fc31.x86_64
[ganto@host ~]$ rpm -q lxcfs
lxcfs-4.0.3-0.1.fc31.x86_64
[ganto@host ~]$ lxc launch images:centos/8 el8
Creating el8
Starting el8
[ganto@host ~]$ lxc shell el8
[root@el8 ~]# dnf install postfix
...
[root@el8 ~]# systemctl start postfix
[root@el8 ~]# systemctl status postfix
● postfix.service - Postfix Mail Transport Agent
Loaded: loaded (/usr/lib/systemd/system/postfix.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2020-06-23 09:40:44 UTC; 7s ago
Process: 690 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
Process: 689 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)
Process: 684 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
Main PID: 757 (master)
Tasks: 3 (limit: 49967)
Memory: 7.9M
CGroup: /system.slice/postfix.service
├─757 /usr/libexec/postfix/master -w
├─758 pickup -l -t unix -u
└─759 qmgr -l -t unix -u
Jun 23 09:40:44 el8 systemd[1]: Starting Postfix Mail Transport Agent...
Jun 23 09:40:44 el8 postfix/postfix-script[755]: starting the Postfix mail system
Jun 23 09:40:44 el8 postfix/master[757]: daemon started -- version 3.3.1, configuration /etc/postfix
Jun 23 09:40:44 el8 systemd[1]: Started Postfix Mail Transport Agent.
If this still happens for you please provide an exact step by step guide how to reproduce your error.
Sorry for the long delay. Unfortunately, I can't confirm that lxd works in Fedora 31.
Test machine 01
===================================================================================================================================
Package Architecture Version Repository Size
===================================================================================================================================
Installieren:
lxc-libs x86_64 3.2.1-0.3.fc31 @commandline 482 k
lxc-templates x86_64 3.2.1-0.3.fc31 @commandline 20 k
lxcfs x86_64 4.0.4-0.2.fc31 @commandline 80 k
lxd x86_64 3.22-0.1.fc31 @commandline 9.5 M
lxd-client x86_64 3.22-0.1.fc31 @commandline 5.1 M
Abhängigkeiten werden installiert:
container-selinux noarch 2:2.137.0-3.fc31 updates 47 k
libuv x86_64 1:1.38.0-2.fc31 updates 149 k
squashfs-tools x86_64 4.3-22.fc31 fedora 160 k
xdelta x86_64 3.1.0-8.fc31 fedora 89 k
Transaktionsübersicht
===================================================================================================================================
[root@pontos lxd-322]# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: n
Do you want to configure a new storage pool? (yes/no) [default=yes]: y
Name of the new storage pool [default=default]: dirpool
Name of the storage backend to use (lvm, dir) [default=dir]: dir
Would you like to connect to a MAAS server? (yes/no) [default=no]: n
Would you like to create a new local network bridge? (yes/no) [default=yes]: yes
What should the new bridge be called? [default=lxdbr0]: lxdbr0
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 192.168.111.1/24
Would you like LXD to NAT IPv4 traffic on your bridge? [default=yes]: yes
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like LXD to be available over the network? (yes/no) [default=no]: n
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] y
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: n
- Checked IPs
[root@pontos lxd-322]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:50:56:00:e7:4e brd ff:ff:ff:ff:ff:ff inet 148.251.152.56/27 brd 148.251.152.63 scope global dynamic noprefixroute enp1s0 valid_lft 28316sec preferred_lft 28316sec inet6 fe80::250:56ff:fe00:e74e/64 scope link valid_lft forever preferred_lft forever 3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8000 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:6f:a8:5d brd ff:ff:ff:ff:ff:ff inet 192.168.122.42/24 brd 192.168.122.255 scope global dynamic noprefixroute enp2s0 valid_lft 2266sec preferred_lft 2266sec inet6 fe80::5054:ff:fe6f:a85d/64 scope link valid_lft forever preferred_lft forever 4: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether e6:24:b1:76:b3:45 brd ff:ff:ff:ff:ff:ff inet 192.168.111.1/24 scope global lxdbr0 valid_lft forever preferred_lft forever inet6 fe80::4c8f:96ff:febb:de76/64 scope link valid_lft forever preferred_lft forever
- launched new container
[root@pontos lxd-322]# lxc launch images:fedora/31 test01
Erstelle test01
Starting test01
[root@pontos lxd-322]# lxc list
+--------+---------+------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+--------+---------+------+------+-----------+-----------+
| test01 | STOPPED | | | CONTAINER | 0 |
+--------+---------+------+------+-----------+-----------+
- The container did not start and and could not be started by 'lxd start test01'
**Test machine 02**
- Working Installation Fedora 30 / lxd 3.22 along with libvirt / kvm
- Everything works beside starting postfix.
After update to Fedora 31 (using dnf)
[root@hydra ~]# lxc list +--------+---------+------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------+---------+------+------+-----------+-----------+ | test01 | RUNNING | | | CONTAINER | 0 | +--------+---------+------+------+-----------+-----------+
- the existing container didn't get an IP address anymore
- lxc exec into the container works
[root@test01 ~]# systemctl status postfix
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
[root@test01 ~]# dhclient -v
Internet Systems Consortium DHCP Client 4.3.6
Copyright 2004-2017 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Listening on LPF/eth0/00:16:3e:c9:d8:34 Sending on LPF/eth0/00:16:3e:c9:d8:34 Sending on Socket/fallback DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0xf022f350) DHCPACK from 192.168.111.1 (xid=0xf022f350) System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down
After waiting some time:
`bound to 192.168.111.228 -- renewal in 1744 seconds.`
- On host (outside container)
[root@hydra ~]# lxc list +--------+---------+------------------------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------+---------+------------------------+------+-----------+-----------+ | test01 | RUNNING | 192.168.111.228 (eth0) | | CONTAINER | 0 | +--------+---------+------------------------+------+-----------+-----------+
- Again back into the container:
[root@test01 ~]# systemctl status postfix System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down
I'm at my wits' end.
I can grant you access to both machines if it helps. And I can run all kind of test you suggest.
I have similiar issues with various CentOS 8 machines and LXD.
- CentOS 8.2 & LXD 3.22: No IP address inside container, even lxdbr1 gets no IP
- CentOS 8.2 & LXD 3.18: IP addresses distributed, everything works (but postfix, it doesn't start)
- CentOS 8.1 & LXD 3.21: IP addresses distributed, everything works (but postfix, it doesn#t start)
Can you please check the output of /var/log/lxd/test01/console.log
? Are there any errors? Seems that systemd is not running properly. Might be related to cgroup v2. Maybe also have a look at ganto/copr-lxc3#21.
Test machine 01
In /var/log/lxd/lxd.log I found:
[root@pontos lxd]# less lxd.log
t=2020-07-17T08:00:06+0200 lvl=info msg=" - g 0 1000000 65536"
t=2020-07-17T08:00:06+0200 lvl=warn msg="AppArmor support has been disabled because of lack of kernel support"
t=2020-07-17T08:00:06+0200 lvl=info msg="Kernel features:"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - uevent injection: yes"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - seccomp listener: yes"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - cgroup layout: cgroup2"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup blkio, I/O limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup CPU controller, CPU time limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup CPUacct controller, CPU accounting will not be available"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup CPUset controller, CPU pinning will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup devices controller, device access control won't work"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup freezer controller, pausing/resuming containers won't work"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup memory controller, memory limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup network class controller, network limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup pids controller, process limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-07-17T08:00:06+0200 lvl=info msg=" - shiftfs support: no"
t=2020-07-17T08:00:06+0200 lvl=info msg="Initializing local database"
. . .
. . .
t=2020-07-17T08:05:47+0200 lvl=info msg="Downloading image" alias=fedora/31 image=a3c107febe6ac24f406a8b56f1db1be3ee774772e21663ad3877258d80b47310 operation=405a7c96-9daf-43a7-9d83-a93df20d2fe9 server=https://images.linuxcontainers.org trigger=/1.0/operations/405a7c96-9daf-43a7-9d83-a93df20d2fe9
t=2020-07-17T08:05:50+0200 lvl=info msg="Image downloaded" alias=fedora/31 image=a3c107febe6ac24f406a8b56f1db1be3ee774772e21663ad3877258d80b47310 operation=405a7c96-9daf-43a7-9d83-a93df20d2fe9 server=https://images.linuxcontainers.org trigger=/1.0/operations/405a7c96-9daf-43a7-9d83-a93df20d2fe9
t=2020-07-17T08:05:50+0200 lvl=info msg="Creating container" ephemeral=false name=test01 project=default
t=2020-07-17T08:05:50+0200 lvl=info msg="Created container" ephemeral=false name=test01 project=default
t=2020-07-17T08:05:50+0200 lvl=warn msg="The backing filesystem doesn't support quotas, skipping quota" driver=dir path=/var/lib/lxd/storage-pools/dirpool/containers/test01 pool=dirpool
t=2020-07-17T08:06:04+0200 lvl=info msg="Starting container" action=start created=2020-07-17T08:05:50+0200 ephemeral=false name=test01 project=default stateful=false used=1970-01-01T01:00:00+0100
t=2020-07-17T08:06:05+0200 lvl=info msg="Started container" action=start created=2020-07-17T08:05:50+0200 ephemeral=false name=test01 project=default state
And in /var/log/lxc/test01/console.log
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems.
Exiting PID 1...
Looks not good.
Test machine 02 I found as well:
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup blkio, I/O limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup CPU controller, CPU time limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup CPUacct controller, CPU accounting will not be available"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup CPUset controller, CPU pinning will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup devices controller, device access control won't work"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup freezer controller, pausing/resuming containers won't work"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup memory controller, memory limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup network class controller, network limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup pids controller, process limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-07-16T17:03:48+0200 lvl=info msg=" - shiftfs support: no"
t=2020-07-16T17:03:48+0200 lvl=info msg="Initializing local database"
And in /var/log/lxd/test01/console.log
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems, freezing.
Freezing execution.
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems, freezing.
Freezing execution.
Ok, I thought so. See #21 for a "solution" aka work-around for this. You might want to ask upstream if this behavior is expected or if this is a bug. It's definitely not something that I can fix in the RPM packaging ;-)
Ok, I used grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
to add the workaround kernel parameter and after a reboot the containers got an IP and started again.
And I could start postfix just fine with lxcfs 4.0.4-0.2.fc31. Thanks for that work and information!
But immediately I have another question: can you provide lxcfs 4.0.4 also for CentOS 8 and lxd 3.22 and 3.21?
I just checked again and there is still a permission denied issue when you try to start a postfix process in lxd 3.22 hosted by CentOS 8 (8.2.2009) and Fedora 31. So you can't use postfix in a container at all.
The error message is: /usr/libexec/postfix/postfix-script: line 127: /dev/null: Permission denied
We first discussed the issue an Apr. 6. Is there even a chance to fix the issue or is it just too big and unmanageable?