ganto / copr-lxc3

RPM spec files for building lxc-3 on Fedora COPR
MIT License
8 stars 2 forks source link

Permission denied issue #25

Open resdigita opened 4 years ago

resdigita commented 4 years ago

I just checked again and there is still a permission denied issue when you try to start a postfix process in lxd 3.22 hosted by CentOS 8 (8.2.2009) and Fedora 31. So you can't use postfix in a container at all.

The error message is: /usr/libexec/postfix/postfix-script: line 127: /dev/null: Permission denied

We first discussed the issue an Apr. 6. Is there even a chance to fix the issue or is it just too big and unmanageable?

ganto commented 4 years ago

Can you try the latest lxcfs from the ganto/lxc4 COPR. As I said before, I assumed this to be an issue with lxcfs that seems to be fixed now. I cannot reproduce the error:

[ganto@host ~]$ lsb_release -d
Description:    Fedora release 31 (Thirty One)
[ganto@host ~]$ rpm -q lxd
lxd-3.22-0.1.fc31.x86_64
[ganto@host ~]$ rpm -q lxcfs
lxcfs-4.0.3-0.1.fc31.x86_64
[ganto@host ~]$ lxc launch images:centos/8 el8                                                                                                                        
Creating el8                                                                                                                                                             
Starting el8
[ganto@host ~]$ lxc shell el8                                                                                                                                         
[root@el8 ~]# dnf install postfix
...
[root@el8 ~]# systemctl start postfix                                                                                                                                    
[root@el8 ~]# systemctl status postfix                                                                                                                                   
● postfix.service - Postfix Mail Transport Agent                                                                                                                         
   Loaded: loaded (/usr/lib/systemd/system/postfix.service; disabled; vendor preset: disabled)                                                                           
   Active: active (running) since Tue 2020-06-23 09:40:44 UTC; 7s ago                                                                                                    
  Process: 690 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)                                                                                         
  Process: 689 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)                                                                           
  Process: 684 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
 Main PID: 757 (master)
    Tasks: 3 (limit: 49967)
   Memory: 7.9M
   CGroup: /system.slice/postfix.service
           ├─757 /usr/libexec/postfix/master -w
           ├─758 pickup -l -t unix -u
           └─759 qmgr -l -t unix -u

Jun 23 09:40:44 el8 systemd[1]: Starting Postfix Mail Transport Agent...
Jun 23 09:40:44 el8 postfix/postfix-script[755]: starting the Postfix mail system
Jun 23 09:40:44 el8 postfix/master[757]: daemon started -- version 3.3.1, configuration /etc/postfix
Jun 23 09:40:44 el8 systemd[1]: Started Postfix Mail Transport Agent.

If this still happens for you please provide an exact step by step guide how to reproduce your error.

resdigita commented 4 years ago

Sorry for the long delay. Unfortunately, I can't confirm that lxd works in Fedora 31.

Test machine 01

===================================================================================================================================
 Package                            Architecture            Version                            Repository                     Size
===================================================================================================================================
Installieren:
 lxc-libs                           x86_64                  3.2.1-0.3.fc31                     @commandline                  482 k
 lxc-templates                      x86_64                  3.2.1-0.3.fc31                     @commandline                   20 k
 lxcfs                              x86_64                  4.0.4-0.2.fc31                     @commandline                   80 k
 lxd                                x86_64                  3.22-0.1.fc31                      @commandline                  9.5 M
 lxd-client                         x86_64                  3.22-0.1.fc31                      @commandline                  5.1 M
Abhängigkeiten werden installiert:
 container-selinux                  noarch                  2:2.137.0-3.fc31                   updates                        47 k
 libuv                              x86_64                  1:1.38.0-2.fc31                    updates                       149 k
 squashfs-tools                     x86_64                  4.3-22.fc31                        fedora                        160 k
 xdelta                             x86_64                  3.1.0-8.fc31                       fedora                         89 k

Transaktionsübersicht
===================================================================================================================================
- Checked IPs

[root@pontos lxd-322]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:50:56:00:e7:4e brd ff:ff:ff:ff:ff:ff inet 148.251.152.56/27 brd 148.251.152.63 scope global dynamic noprefixroute enp1s0 valid_lft 28316sec preferred_lft 28316sec inet6 fe80::250:56ff:fe00:e74e/64 scope link valid_lft forever preferred_lft forever 3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8000 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:6f:a8:5d brd ff:ff:ff:ff:ff:ff inet 192.168.122.42/24 brd 192.168.122.255 scope global dynamic noprefixroute enp2s0 valid_lft 2266sec preferred_lft 2266sec inet6 fe80::5054:ff:fe6f:a85d/64 scope link valid_lft forever preferred_lft forever 4: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether e6:24:b1:76:b3:45 brd ff:ff:ff:ff:ff:ff inet 192.168.111.1/24 scope global lxdbr0 valid_lft forever preferred_lft forever inet6 fe80::4c8f:96ff:febb:de76/64 scope link valid_lft forever preferred_lft forever


- launched new container

[root@pontos lxd-322]# lxc launch images:fedora/31 test01 Erstelle test01 Starting test01
[root@pontos lxd-322]# lxc list +--------+---------+------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------+---------+------+------+-----------+-----------+ | test01 | STOPPED | | | CONTAINER | 0 | +--------+---------+------+------+-----------+-----------+

- The container did not start and  and could not be started by 'lxd start test01' 

**Test machine 02**

- Working Installation Fedora 30 / lxd 3.22 along with libvirt / kvm 
- Everything works beside starting  postfix.

After update to Fedora 31 (using dnf)

[root@hydra ~]# lxc list +--------+---------+------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------+---------+------+------+-----------+-----------+ | test01 | RUNNING | | | CONTAINER | 0 | +--------+---------+------+------+-----------+-----------+


- the existing container didn't get an IP address anymore
- lxc exec into the container works

[root@test01 ~]# systemctl status postfix System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down [root@test01 ~]# dhclient -v
Internet Systems Consortium DHCP Client 4.3.6 Copyright 2004-2017 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eth0/00:16:3e:c9:d8:34 Sending on LPF/eth0/00:16:3e:c9:d8:34 Sending on Socket/fallback DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0xf022f350) DHCPACK from 192.168.111.1 (xid=0xf022f350) System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down

After waiting some time:

`bound to 192.168.111.228 -- renewal in 1744 seconds.`

- On host (outside container)

[root@hydra ~]# lxc list +--------+---------+------------------------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------+---------+------------------------+------+-----------+-----------+ | test01 | RUNNING | 192.168.111.228 (eth0) | | CONTAINER | 0 | +--------+---------+------------------------+------+-----------+-----------+

- Again back into the container:

[root@test01 ~]# systemctl status postfix System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down



I'm at my wits' end.

I can grant you access to both machines if it helps. And I can run all kind of test you suggest.

I have similiar issues with various CentOS 8 machines and LXD.

- CentOS 8.2 & LXD 3.22: No IP address inside container, even lxdbr1 gets no IP
- CentOS 8.2 & LXD 3.18: IP addresses distributed, everything works (but postfix, it doesn't start)
- CentOS 8.1 & LXD 3.21: IP addresses distributed, everything works (but postfix, it doesn#t start)
ganto commented 4 years ago

Can you please check the output of /var/log/lxd/test01/console.log? Are there any errors? Seems that systemd is not running properly. Might be related to cgroup v2. Maybe also have a look at ganto/copr-lxc3#21.

resdigita commented 4 years ago

Test machine 01

In /var/log/lxd/lxd.log I found:

[root@pontos lxd]# less lxd.log 

t=2020-07-17T08:00:06+0200 lvl=info msg=" - g 0 1000000 65536" 
t=2020-07-17T08:00:06+0200 lvl=warn msg="AppArmor support has been disabled because of lack of kernel support" 
t=2020-07-17T08:00:06+0200 lvl=info msg="Kernel features:" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - netnsid-based network retrieval: yes" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - uevent injection: yes" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - seccomp listener: yes" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - seccomp listener continue syscalls: yes" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - unprivileged file capabilities: yes" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - cgroup layout: cgroup2" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup blkio, I/O limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup CPU controller, CPU time limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup CPUacct controller, CPU accounting will not be available" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup CPUset controller, CPU pinning will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup devices controller, device access control won't work" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup freezer controller, pausing/resuming containers won't work" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup memory controller, memory limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup network class controller, network limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup pids controller, process limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored" 
t=2020-07-17T08:00:06+0200 lvl=info msg=" - shiftfs support: no" 
t=2020-07-17T08:00:06+0200 lvl=info msg="Initializing local database" 
. . . 
. . .
t=2020-07-17T08:05:47+0200 lvl=info msg="Downloading image" alias=fedora/31 image=a3c107febe6ac24f406a8b56f1db1be3ee774772e21663ad3877258d80b47310 operation=405a7c96-9daf-43a7-9d83-a93df20d2fe9 server=https://images.linuxcontainers.org trigger=/1.0/operations/405a7c96-9daf-43a7-9d83-a93df20d2fe9
t=2020-07-17T08:05:50+0200 lvl=info msg="Image downloaded" alias=fedora/31 image=a3c107febe6ac24f406a8b56f1db1be3ee774772e21663ad3877258d80b47310 operation=405a7c96-9daf-43a7-9d83-a93df20d2fe9 server=https://images.linuxcontainers.org trigger=/1.0/operations/405a7c96-9daf-43a7-9d83-a93df20d2fe9
t=2020-07-17T08:05:50+0200 lvl=info msg="Creating container" ephemeral=false name=test01 project=default
t=2020-07-17T08:05:50+0200 lvl=info msg="Created container" ephemeral=false name=test01 project=default
t=2020-07-17T08:05:50+0200 lvl=warn msg="The backing filesystem doesn't support quotas, skipping quota" driver=dir path=/var/lib/lxd/storage-pools/dirpool/containers/test01 pool=dirpool
t=2020-07-17T08:06:04+0200 lvl=info msg="Starting container" action=start created=2020-07-17T08:05:50+0200 ephemeral=false name=test01 project=default stateful=false used=1970-01-01T01:00:00+0100
t=2020-07-17T08:06:05+0200 lvl=info msg="Started container" action=start created=2020-07-17T08:05:50+0200 ephemeral=false name=test01 project=default state

And in /var/log/lxc/test01/console.log

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems.
Exiting PID 1...

Looks not good.

Test machine 02 I found as well:

t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup blkio, I/O limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup CPU controller, CPU time limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup CPUacct controller, CPU accounting will not be available" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup CPUset controller, CPU pinning will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup devices controller, device access control won't work" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup freezer controller, pausing/resuming containers won't work" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup memory controller, memory limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup network class controller, network limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup pids controller, process limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored" 
t=2020-07-16T17:03:48+0200 lvl=info msg=" - shiftfs support: no" 
t=2020-07-16T17:03:48+0200 lvl=info msg="Initializing local database" 

And in /var/log/lxd/test01/console.log

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems, freezing.
Freezing execution.
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[ESC[0;1;31m!!!!!!ESC[0m] Failed to mount API filesystems, freezing.
Freezing execution.
ganto commented 4 years ago

Ok, I thought so. See #21 for a "solution" aka work-around for this. You might want to ask upstream if this behavior is expected or if this is a bug. It's definitely not something that I can fix in the RPM packaging ;-)

resdigita commented 4 years ago

Ok, I used grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0" to add the workaround kernel parameter and after a reboot the containers got an IP and started again.

And I could start postfix just fine with lxcfs 4.0.4-0.2.fc31. Thanks for that work and information!

But immediately I have another question: can you provide lxcfs 4.0.4 also for CentOS 8 and lxd 3.22 and 3.21?