AlmaLinux / cloud-images

Packer templates and other tools for building AlmaLinux images for various cloud platforms.
MIT License
161 stars 48 forks source link

AlmaLinux 9 always boots up with interface down on LXD/LXC #136

Open lukasz-zaroda opened 1 year ago

lukasz-zaroda commented 1 year ago

Host is AlmaLinux 9.1, guest is also AlmaLinux 9.1. LXD 5.10. Whatever I try, guest always boots with the eth0 interface down and no routes.

ONBOOT=yes in /etc/sysconfig/network-scripts/ifcfg-eth0 doesn’t work.

nmcli d mod eth0 autoconnect yes doesn’t work.

I even tried to replace /etc/sysconfig/network-scripts/ifcfg-eth0 with a configuration in /etc/NetworkManager/system-connections/ to no avail.

Content of the /etc/sysconfig/network-scripts/ifcfg-eth0 seems fine.

Networking can be restored by running:

nmcli c up eth0
nmcli d mod eth0 ipv4.gateway 169.254.0.1

But it works only until the next reboot.

Time for logs/data etc.:

[luken@n8237h81 ~]$ lxc config show alma
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Almalinux 9 amd64 (20230220_23:08)
  image.os: Almalinux
  image.release: "9"
  image.serial: "20230220_23:08"
  image.type: squashfs
  image.variant: cloud
  volatile.base_image: dc16b9373c62fb502e16ba91104c948aec41d0a3a3395222204a6b19edcc9e48
  volatile.cloud-init.instance-id: fe251e9a-8a8c-4df2-bbaa-7b1b2142137a
  volatile.eth0.host_name: veth-alma
  volatile.eth0.hwaddr: 00:16:3e:30:05:2a
  volatile.eth0.name: eth0
  volatile.idmap.base: "1196608"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1196608,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true
,"Hostid":1196608,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1196608,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"H
ostid":1196608,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1196608,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":t
rue,"Hostid":1196608,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: df23d3e7-4c90-4f0b-bf43-95e57672d646
devices: {}
ephemeral: false
profiles:
- default
- alma
stateful: false
description: ""
[luken@n8237h81 ~]$ lxc profile show alma
config:
  limits.cpu: "1"
  limits.memory: 2GB
  user.network-config: |
    version: 2
    ethernets:
      eth0:
        dhcp4: no
        dhcp6: no
        addresses:
        - 188[redacted]/32
        nameservers:
          addresses:
          - 8.8.8.8
          - 8.8.4.4
          search: []
        routes:
        - to: 0.0.0.0/0
          via: 169.254.0.1
          on-link: true
  user.user-data: |
    #cloud-config
    users:
      - name: luken
        gecos: ''
        primary_group: luken
        groups: "sudo"
        shell: /bin/bash
        sudo: ALL=(ALL) NOPASSWD:ALL
        ssh_authorized_keys:
         - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDxbJloAw7zOCoGgZ5dcunGwxKSNTJW8xjExbmgeBFRQvHvnuhR4GjZUcCPfqb0r3popg66b+f6pHMa7FGejZsTmZaVCrZL6P0BqvojXEfMYcTIjRNG4TF2VJ6WBnvkMb0iLH9W9sADeBi9I20igP1ds/SQ0ytITl9B0OQbG8RXE6F5mU1yitF11urKIn2gzvbRxfesqOSgQOPjM45Ykh1Wuc98Xg49xzVVpMi2Y+fOuoKEg7owl7t36gn0IuA1WFabyJlShbmefeXPdT1xTgEPVhR9T7kmqlAr8b2XsFz5SiH3q2x+me7FxKAPqJnazH7U7NtiSlym5Xpj5Z6tv9/Yr592kyiKy+hRVg7tA/q79etHQSMmSsj4q3Mx+VzUu7xlWVTYl2AvWyBXW8kTLenCy5D2Rq/NFWS0Fc2Mbs0eLwD1GYO3VkeKsrL57VSE2nt6VX2hrMNoMew2rUSA8h/CKenN+U0KMWk2rnVs4Eua8tFKYfBl0at5SHPqn5USs99kp1IQUKbeAvpCba9x23GPzA7p3lVS/9bjxq0djhPnklhKf75woayYEmCh4do6TRmMVfSaANT4AI/IuoxGGJbQRbEWtaBdpaAsjRLal9ib+rq3GC1xi4QShoVlWWvLKx+++E/WnbKFq6oCUF/k9YQL+dgfGao6Nh+39YiJ/lOf4Q== (none)
    package_update: true
    package_upgrade: true
    packages:
      - openssh-server
description: Almalinux testing profile.
devices:
  eth0:
    host_name: veth-alma
    ipv4.address: 188[redacted]
    nictype: routed
    parent: enp1s0f0
    type: nic
  root:
    path: /
    pool: default
    size: 20GB
    type: disk
name: alma
used_by:
- /1.0/instances/alma

Full cloud-init.log from the container:

https://gist.github.com/lukasz-zaroda/51ef4a284892c6527b23f94bcb3be72d

journalctl -u NetworkManager from the container:

https://gist.github.com/lukasz-zaroda/e2eec7f7c7e81b6a761f899923bb8ecf

lukasz-zaroda commented 1 year ago

I found a workaround for this issue.

It turns out that when you add to your cloud-init's user-data boot-cmd commands that enable the interface, the issue gets magically resolved!

My theory is that because of the initially down interface, the cloud-init network setup fails in some weird way, breaking NetworkManager somehow. But if we'll ensure that eth0 is up during the cloud-init run, everything completes fine, and NetworkManager works as expected. eth0 correctly gets automatically up after each reboot and the network is available!

Edit: Actually bootcmd might be running after every reboot, so it might be just hiding an issue, but at least it works.

I have no idea what is exactly at fault here (AlmaLinux, cloud-init, NetworkManager or LXD), but at least we have a solution.

This is the example LXD profile for AlmaLinux 9.1, where networking actually works:

      - name: alma
        description: "Almalinux testing profile."
        config:
          user.user-data: |
            #cloud-config
            bootcmd:
              - nmcli c up eth0
              - nmcli d mod eth0 ipv4.gateway 169.254.0.1
          user.network-config: |
            version: 2
            ethernets:
              eth0:
                dhcp4: no
                dhcp6: no
                addresses:
                - [something]/32
                nameservers:
                  addresses:
                  - 8.8.8.8
                  - 8.8.4.4
                  search: []
                routes:
                - to: 0.0.0.0/0
                  via: 169.254.0.1
                  on-link: true
        devices:
          eth0:
            type: nic
            ipv4.address: [something]
            nictype: routed
            parent: enp1s0f0
            host_name: veth-alma
          root:
            type: disk
            path: /
            pool: default
            size: 20GB