AlmaLinux / cloud-images

Packer templates and other tools for building AlmaLinux images for various cloud platforms.
MIT License
145 stars 47 forks source link

Issue on LXC Alma 9 #96

Open liberodark opened 2 years ago

liberodark commented 2 years ago

Hi,

Have some issues on LXC 4.0.12 with Alma v9 Build Network/Login not work. Have try to start NetworkManager but fail too.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0@if42: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7a:25:b0:7a:03:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
UUID=cbec8502-e1a9-11ec-8000-ce1e1be31218
BOOTPROTO=none
IPADDR=10.xx.xx.xx
NETMASK=255.255.255.0
GATEWAY=10.xx.xx.xx
DNS1=10.xx.xx.xx
DNS2=10.xx.xx.xx
DOMAIN=xxxx.fr
systemctl status NetworkManager
○ NetworkManager.service - Network Manager
     Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: inactive (dead)
       Docs: man:NetworkManager(8)
cat /etc/*release
AlmaLinux release 9.0 (Emerald Puma)
NAME="AlmaLinux"
VERSION="9.0 (Emerald Puma)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.0 (Emerald Puma)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.0"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"
AlmaLinux release 9.0 (Emerald Puma)
AlmaLinux release 9.0 (Emerald Puma)

From this PR : https://github.com/lxc/lxc-ci/pull/535

Have try this image : https://uk.lxd.images.canonical.com/images/almalinux/9/amd64/default/20220531_23:08/rootfs.tar.xz

Also have check the hash : fbd7ed8d6783170b12d8224c9c0e3321df099e82d56ff1cc1c87509d662d6dc3 rootfs.tar.xz

Best Regards

LKHN commented 2 years ago

Hi,

Cannot reproduce any issue on LXD 5.2 and Proxmox 7.1-7.

LXD:

  driver: lxc                                                                                                                                                                                 
  driver_version: 4.0.12                                                                                                                                                                      
  firewall: nftables                                                                                                                                                                          
  kernel: Linux                                                                                                                                                                               
  kernel_architecture: x86_64                                                                                                                                                                 
  kernel_features:                                                                                                                                                                            
    idmapped_mounts: "true"                                                                                                                                                                   
    netnsid_getifaddrs: "true"                                                                                                                                                                
    seccomp_listener: "true"                                                                                                                                                                  
    seccomp_listener_continue: "true"                                                                                                                                                         
    shiftfs: "false"                                                                                                                                                                          
    uevent_injection: "true"                                                                                                                                                                  
    unpriv_fscaps: "true"                                                                                                                                                                     
  kernel_version: 5.14.0-70.13.1.el9_0.x86_64                                                                                                                                                 
  lxc_features:                                                                                                                                                                               
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: AlmaLinux
  os_version: "9.0"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: ip-XXX-XXX-XXX-XXX.ec2.internal
  server_pid: 2420
  server_version: "5.2"
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: ceph
    version: 15.2.16
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.16
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
    remote: false
[root@alma-test ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:53:2f:62 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.78.5.236/24 brd 10.78.5.255 scope global dynamic noprefixroute eth0
       valid_lft 2705sec preferred_lft 2705sec
    inet6 fd42:1650:5ef7:f42e:216:3eff:fe53:2f62/64 scope global dynamic noprefixroute 
       valid_lft 3359sec preferred_lft 3359sec
    inet6 fe80::216:3eff:fe53:2f62/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[root@alma-test ~]# curl https://mirrors.almalinux.org/mirrorlist/9/appstream

http://nyc.mirrors.clouvider.net/almalinux/9/AppStream/$basearch/os/
http://mirror.cogentco.com/pub/linux/almalinux/9/AppStream/$basearch/os/
http://mirror2.sandyriver.net/pub/almalinux/9/AppStream/$basearch/os/
http://mirror.siena.edu/almalinux/9/AppStream/$basearch/os/
http://mirror.cloudpropeller.com/almalinux/9/AppStream/$basearch/os/
http://iad.mirror.rackspace.com/almalinux/9/AppStream/$basearch/os/
http://mirror.vtti.vt.edu/almalinux/9/AppStream/$basearch/os/
http://mirror.cs.pitt.edu/almalinux/9/AppStream/$basearch/os/
http://mirror.nodespace.net/almalinux/9/AppStream/$basearch/os/
http://ash.mirrors.clouvider.net/almalinux/9/AppStream/$basearch/os/

Proxmox 7.1-7:

[root@alma90test network-scripts]# hostnamectl
   Static hostname: n/a                                 
Transient hostname: alma90test
         Icon name: computer-container
           Chassis: container ☐
        Machine ID: 3d2c53c8f6bc4ac1b7cab663d1e9d933
           Boot ID: 7c56f955de7a4d48a2b019cf2069ac8f
    Virtualization: lxc
  Operating System: AlmaLinux 9.0 (Emerald Puma)        
       CPE OS Name: cpe:/o:almalinux:almalinux:9::baseos
            Kernel: Linux 5.13.19-2-pve
      Architecture: x86-64
[root@alma90test network-scripts]# uname -a
Linux alma90test 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100) x86_64 x86_64 x86_64 GNU/Linux
[root@alma90test ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
     Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-06-10 20:15:03 UTC; 1min 38s ago
       Docs: man:NetworkManager(8)
   Main PID: 87 (NetworkManager)
      Tasks: 3 (limit: 617945)
     Memory: 6.7M
        CPU: 213ms
     CGroup: /system.slice/NetworkManager.service
             └─87 /usr/sbin/NetworkManager --no-daemon

Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1183] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1206] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1343] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1345] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1348] manager: NetworkManager state is now CONNECTED_LOCAL
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1351] manager: NetworkManager state is now CONNECTED_SITE
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1352] policy: set 'System eth0' (eth0) as default for IPv4 routing and DNS
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1411] device (eth0): Activation: successful, device activated.
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1416] manager: NetworkManager state is now CONNECTED_GLOBAL
Jun 10 20:15:04 alma90test NetworkManager[87]: <info>  [1654892104.1418] manager: startup complete
liberodark commented 2 years ago

Have try on PVE 7.2-4 also im using cgroup v2 with NFS v3 storage I can retry new build for test Same issue with : https://uk.lxd.images.canonical.com/images/almalinux/9/amd64/default/20220612_23:09/rootfs.tar.xz SHA256 : efefe4e6f10f0b7ecb5ca3bf94ef638c4cdce897c3edc168d5b35ce8a221590a rootfs.tar.xz

Console is not working too (I can't log in through the console) : image

I have to go through pct enter my_id

[root@test-alma9 ~]# hostnamectl
Failed to connect to bus: No such file or directory
[root@test-alma9 ~]# uname -a
Linux test-alma9 5.15.35-2-pve #1 SMP PVE 5.15.35-5 (Wed, 08 Jun 2022 15:02:51 +0200) x86_64 x86_64 x86_64 GNU/Linux
[root@test-alma9 ~]# systemctl status NetworkManager
○ NetworkManager.service - Network Manager
     Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: inactive (dead)
       Docs: man:NetworkManager(8)
LKHN commented 2 years ago

Hi @liberodark, Thanks for the answer.

I've never faced any issues with the networking so far, but I can reproduce the console issue with almalinux9 and centos9stream images.

AlmaLinux OS 9:

[root@al-90-test-1 ~]# systemctl --version
systemd 250 (250-6.el9_0)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

CentOS 9 Stream:

[root@c9stream-test-1 ~]# systemctl --version
systemd 250 (250-7.el9)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
[root@c9stream-test-1 ~]# 

Both:

[root@al-90-test-1 ~]# systemctl -a | grep -i getty
  console-getty.service                  loaded    inactive   dead      start Console Getty
  container-getty@1.service              loaded    inactive   dead      start Container Getty on /dev/tty1
  container-getty@2.service              loaded    inactive   dead      start Container Getty on /dev/tty2
  getty@tty1.service                     loaded    inactive   dead      start Getty on tty1
  system-container\x2dgetty.slice        loaded    active     active          Slice /system/container-getty
  system-getty.slice                     loaded    active     active          Slice /system/getty
  getty-pre.target                       loaded    inactive   dead            Preparation for Logins
  getty.target                           loaded    inactive   dead      start Login Prompts

No Getty related logs

[root@al-90-test-1 ~]# journalctl | grep -i tty
[root@al-90-test-1 ~]# 

Fedora 35:

[root@f35-test-1 ~]# systemctl --version
systemd 249 (v249.12-3.fc35)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

console-getty.service, container-getty@1.service, container-getty@2.service and getty.target running and healthy:

[root@f35-test-1 ~]# systemctl -a | grep -Ei getty
  console-getty.service                  loaded    active   running   Console Getty
  container-getty@1.service              loaded    active   running   Container Getty on /dev/tty1
  container-getty@2.service              loaded    active   running   Container Getty on /dev/tty2
  getty@tty1.service                     loaded    inactive dead      Getty on tty1
  system-container\x2dgetty.slice        loaded    active   active    Slice /system/container-getty
  system-getty.slice                     loaded    active   active    Slice /system/getty
  getty-pre.target                       loaded    inactive dead      Preparation for Logins
  getty.target                           loaded    active   active    Login Prompts

We can see that they started sequentially:

[root@f35-test-1 ~]# journalctl | grep -i tty
Jun 14 11:43:06 f35-test-1 systemd[1]: Started Console Getty.
Jun 14 11:43:06 f35-test-1 systemd[1]: Started Container Getty on /dev/tty1.
Jun 14 11:43:06 f35-test-1 systemd[1]: Started Container Getty on /dev/tty2.
Jun 14 11:43:06 f35-test-1 systemd[1]: Condition check resulted in Getty on tty1 being skipped.
Jun 14 11:43:45 f35-test-1 login[104]: ROOT LOGIN ON tty1

The Proxmox console works and looks like this when I disable the /dev/console device in the container settings:

Options >> /dev/console: Disabled

[root@al-90-test-2 ~]# systemctl -a | grep -Ei getty
  console-getty.service                  loaded    inactive   dead            Console Getty
  container-getty@1.service              loaded    active     running         Container Getty on /dev/tty1
  container-getty@2.service              loaded    active     running         Container Getty on /dev/tty2
  getty@tty1.service                     loaded    inactive   dead            Getty on tty1
  system-container\x2dgetty.slice        loaded    active     active          Slice /system/container-getty
  system-getty.slice                     loaded    active     active          Slice /system/getty
  getty-pre.target                       loaded    inactive   dead            Preparation for Logins
  getty.target                           loaded    active     active          Login Prompts
[root@al-90-test-2 ~]# journalctl | grep tty
Jun 14 12:23:17 al-90-test-2 systemd[1]: Console Getty was skipped because of a failed condition check (ConditionPathExists=/dev/console).
Jun 14 12:23:17 al-90-test-2 systemd[1]: Started Container Getty on /dev/tty1.
Jun 14 12:23:17 al-90-test-2 systemd[1]: Started Container Getty on /dev/tty2.
Jun 14 12:23:17 al-90-test-2 systemd[1]: Getty on tty1 was skipped because of a failed condition check (ConditionPathExists=/dev/tty0).
Jun 14 12:23:33 al-90-test-2 login[105]: ROOT LOGIN ON tty1
liberodark commented 2 years ago

Solution have been found by nasutek

https://forum.proxmox.com/threads/lxc-almalinux-9-issue.110348/post-477778

root@nte-proxmox-1:~# pct enter 106
[root@alma9test ~]# cat /etc/redhat-release
AlmaLinux release 9.0 (Emerald Puma)
[root@alma9test ~]# systemctl stop systemd-firstboot
[root@alma9test ~]# exit

Im my side have make systemctl disable --now systemd-firstboot that fix my issue.

LKHN commented 2 years ago

I've also do some testing with Proxmox 6.4-14. Which has hybrid cgroup setup but mainly cgroupv1.

The console works on Unprivileged and Privileged containers:

[root@al90-test-1 ~]# uname -a
Linux al90-test-1 5.4.174-2-pve #1 SMP PVE 5.4.174-2 (Thu, 10 Mar 2022 15:58:44 +0100) x86_64 x86_64 x86_64 GNU/Linux
[root@al90-test-1 ~]# hostnamectl
   Static hostname: n/a                                 
Transient hostname: al90-test-1
         Icon name: computer-container
           Chassis: container ☐
        Machine ID: 824fd2d32cff49b4aae1ee95814c1ade
           Boot ID: ed88a9576f0b4f7aabd18d6d94eba459
    Virtualization: lxc
  Operating System: AlmaLinux 9.0 (Emerald Puma)        
       CPE OS Name: cpe:/o:almalinux:almalinux:9::baseos
            Kernel: Linux 5.4.174-2-pve
      Architecture: x86-64

The network also works with 6.4.

The firstboot workaround: Strangely, the condition is not met on 6.4 and it's not started whereas on 7.1 it exits with an error.

6.4:

[root@al-90-test-1 ~]# systemctl status systemd-firstboot
○ systemd-firstboot.service - First Boot Wizard
     Loaded: loaded (/usr/lib/systemd/system/systemd-firstboot.service; static)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: inactive (dead)
  Condition: start condition failed at Tue 2022-06-14 20:12:20 UTC; 3min 16s ago
             └─ ConditionFirstBoot=yes was not met
       Docs: man:systemd-firstboot(1)

7.1:

[root@al-90-test1 ~]# systemctl status systemd-firstboot
× systemd-firstboot.service - First Boot Wizard
     Loaded: loaded (/usr/lib/systemd/system/systemd-firstboot.service; static)
     Active: failed (Result: exit-code) since Tue 2022-06-14 21:17:17 UTC; 25s ago
       Docs: man:systemd-firstboot(1)
    Process: 76 ExecStart=systemd-firstboot --prompt-locale --prompt-timezone --prompt-root-password (code=exited, status=208/STDIN)
   Main PID: 76 (code=exited, status=208/STDIN)
        CPU: 645us

Notice: journal has been rotated since unit was started, output may be incomplete.
[root@al-90-test1 ~]# journalctl | grep first
Jun 14 21:17:17 al-90-test1 systemd[76]: systemd-firstboot.service: Failed at step STDIN spawning systemd-firstboot: No such file or directory
Jun 14 21:17:17 al-90-test1 rsyslogd[91]: imjournal: No statefile exists, /var/lib/rsyslog/imjournal.state will be created (ignore if this is first run): No such file or directory [v8.2102.0-101.el9_0.1 try https://www.rsyslog.com/e/2040 ]
Jun 14 21:17:17 al-90-test1 NetworkManager[93]: <info>  [1655241437.5457] NetworkManager (version 1.36.0-4.el9_0) is starting... (for the first time)
liberodark commented 2 years ago

Hi,

If you can try on Proxmox 7.2-4 with cgroupv2 systemd-firstboot stay active on boot that cause the issue on AlmaLinux 9 & CentOS 9. PS : On my side have test on 3 clusters of PVE 7.2 alway fail if not use this workaround Have test on fresh install too same issue. Also have test on PVE 7.2 with cgroupv1 fail too.

systemctl status systemd-firstboot
● systemd-firstboot.service - First Boot Wizard
     Loaded: loaded (/usr/lib/systemd/system/systemd-firstboot.service; static)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: activating (start) since Wed 2022-06-15 07:28:51 UTC; 36s ago
       Docs: man:systemd-firstboot(1)
   Main PID: 71 (systemd-firstbo)
      Tasks: 1 (limit: 256041)
     Memory: 1.2M
     CGroup: /system.slice/systemd-firstboot.service
             └─71 systemd-firstboot --prompt-locale --prompt-timezone --prompt-root-password

Notice: journal has been rotated since unit was started, output may be incomplete.

Best Regards