debops / ansible-ifupdown

Manage network interface configuration in /etc/network/interfaces
GNU General Public License v3.0
25 stars 14 forks source link

ifupdown not able to reconfigure interfaces #75

Open metalwanderer opened 3 years ago

metalwanderer commented 3 years ago

Hi! I really appreciate the debops project! :-) I've done my best to exclude errors on my part, but as best as I can tell, the ifupdown-reconfigure-interfaces can never successfully reconfigure interfaces on my Debian Buster system.

The scenario: A baremetal server with some LXC containers running on it, interfaces are configured statically. The ifupdown role will generate the interface configs in /etc/network/interface.config.d/ but the task to apply the new configs always fails.

From looking at the script, it seems the issue revolves around the assumption that systemd-based systems always have ifup@${if}.service units to control their network interfaces... my system is systemd-based, but has no such units, either on host or in containers.

For the sake of sparing the baremetal host a lot of mucking around with the network, I've done most of my testing with a test container, but the behavior is the same when applied against the host:

(.ansible_venv) ansible@mgmt1:~/midgard/prod$ debops -l test1 --tags "role::ifupdown"                                  Running Ansible playbooks:
/opt/ansible/midgard/prod/ansible/playbooks/site.yml

...

PLAY [Manage network configuration using ifupdown] ********************************************************************

TASK [Gathering Facts] ************************************************************************************************
ok: [test1]

TASK [ifupdown : Prepare configuration of dependent Ansible roles] ****************************************************
ok: [test1]

TASK [sysctl : Pre hooks] *********************************************************************************************

TASK [sysctl : Post hooks] ********************************************************************************************

TASK [ifupdown : Make sure that Ansible local facts directory exists] *************************************************
ok: [test1]

TASK [ifupdown : Save ifupdown local facts] ***************************************************************************
ok: [test1]

TASK [ifupdown : Install required packages] ***************************************************************************
ok: [test1]

TASK [ifupdown : Purge conflicting packages] **************************************************************************
ok: [test1]

TASK [ifupdown : Check systemd version] *******************************************************************************
ok: [test1]

TASK [ifupdown : Install custom ifupdown services] ********************************************************************
ok: [test1] => (item=iface@.service)
ok: [test1] => (item=ifup-wait-all-auto.service)
ok: [test1] => (item=ifup-allow-boot.service)

TASK [ifupdown : Test if Ansible is running in check mode] ************************************************************
ok: [test1]

TASK [ifupdown : Enable custom ifupdown services] *********************************************************************
ok: [test1] => (item=ifup-wait-all-auto)
ok: [test1] => (item=ifup-allow-boot)

TASK [ifupdown : Create configuration directories] ********************************************************************
ok: [test1] => (item=/etc/network/interfaces.d)
ok: [test1] => (item=/etc/network/interfaces.config.d)

TASK [ifupdown : Divert original /etc/network/interfaces] *************************************************************
ok: [test1]

TASK [ifupdown : Create /etc/network/interfaces] **********************************************************************
changed: [test1]

TASK [ifupdown : Ensure that runtime directory exists] ****************************************************************
ok: [test1]

TASK [ifupdown : Request entire network reconfiguration] **************************************************************
changed: [test1]

TASK [ifupdown : Generate network interface configuration] ************************************************************
changed: [test1] => (item={'key': 'eth0', 'value': {'address': '10.100.201.201/16', 'allow': 'hotplug', 'auto': True, 'dns_nameservers': ['10.100.0.2', '10.100.0.1'], 'dns_search': ['midgard.metalwanderer.net'], 'gateway': '10.100.254.254', 'iface': 'eth0', 'inet': 'static', 'inet6': 'auto', 'type': 'ether'}})

TASK [ifupdown : Remove unknown interface configuration] **************************************************************
changed: [test1] => (item={'diff': [], 'dest': '/etc/network/interfaces.config.d/020_iface_eth0', 'src': '/opt/ansible/.ansible/tmp/ansible-tmp-1630618748.460853-1555-254863054817577/source', 'md5sum': '8b79c694c47bae5507c79247f6867de8', 'checksum': '4274a513ad2e2092b86e06428ec96f5a4d2f7d49', 'changed': True, 'uid': 0, 'gid': 0, 'owner': 'root', 'group': 'root', 'mode': '0644', 'state': 'file', 'size': 330, 'invocation': {'module_args': {'src': '/opt/ansible/.ansible/tmp/ansible-tmp-1630618748.460853-1555-254863054817577/source', 'dest': '/etc/network/interfaces.config.d/020_iface_eth0', 'owner': 'root', 'group': 'root', 'mode': '0644', 'follow': False, '_original_basename': 'iface.j2', 'checksum': '4274a513ad2e2092b86e06428ec96f5a4d2f7d49', 'backup': False, 'force': True, 'unsafe_writes': False, 'content': None, 'validate': None, 'directory_mode': None, 'remote_src': None, 'local_follow': None, 'seuser': None, 'serole': None, 'selevel': None, 'setype': None, 'attributes': None}}, 'failed': False, 'item': {'key': 'eth0', 'value': {'address': '10.100.201.201/16', 'allow': 'hotplug', 'auto': True, 'dns_nameservers': ['10.100.0.2', '10.100.0.1'], 'dns_search': ['midgard.metalwanderer.net'], 'gateway': '10.100.254.254', 'iface': 'eth0', 'inet': 'static', 'inet6': 'auto', 'type': 'ether'}}, 'ansible_loop_var': 'item'})

TASK [ifupdown : Mark modified interfaces for processing] *************************************************************
changed: [test1] => (item={'diff': [], 'dest': '/etc/network/interfaces.config.d/020_iface_eth0', 'src': '/opt/ansible/.ansible/tmp/ansible-tmp-1630618748.460853-1555-254863054817577/source', 'md5sum': '8b79c694c47bae5507c79247f6867de8', 'checksum': '4274a513ad2e2092b86e06428ec96f5a4d2f7d49', 'changed': True, 'uid': 0, 'gid': 0, 'owner': 'root', 'group': 'root', 'mode': '0644', 'state': 'file', 'size': 330, 'invocation': {'module_args': {'src': '/opt/ansible/.ansible/tmp/ansible-tmp-1630618748.460853-1555-254863054817577/source', 'dest': '/etc/network/interfaces.config.d/020_iface_eth0', 'owner': 'root', 'group': 'root', 'mode': '0644', 'follow': False, '_original_basename': 'iface.j2', 'checksum': '4274a513ad2e2092b86e06428ec96f5a4d2f7d49', 'backup': False, 'force': True, 'unsafe_writes': False, 'content': None, 'validate': None, 'directory_mode': None, 'remote_src': None, 'local_follow': None, 'seuser': None, 'serole': None, 'selevel': None, 'setype': None, 'attributes': None}}, 'failed': False, 'item': {'key': 'eth0', 'value': {'address': '10.100.201.201/16', 'allow': 'hotplug', 'auto': True, 'dns_nameservers': ['10.100.0.2', '10.100.0.1'], 'dns_search': ['midgard.metalwanderer.net'], 'gateway': '10.100.254.254', 'iface': 'eth0', 'inet': 'static', 'inet6': 'auto', 'type': 'ether'}}, 'ansible_loop_var': 'item'})

TASK [ifupdown : Install custom ifupdown hooks] ***********************************************************************
ok: [test1] => (item={'name': 'filter-dhcp-options', 'hook': 'etc/dhcp/dhclient-enter-hooks.d/filter-dhcp-options', 'mode': '0644', 'state': 'present'})

TASK [ifupdown : Save role version information] ***********************************************************************
ok: [test1]

RUNNING HANDLER [ifupdown : Apply ifupdown configuration] *************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None
fatal: [test1]: FAILED! => changed=true 
  msg: non-zero return code
  rc: 1
  stderr: |-
    Shared connection to test1.midgard.metalwanderer.net closed.
  stderr_lines: <omitted>
  stdout: |-
    Detected interfaces to reconfigure: eth0
    Bringing down 'eth0' interface
    Error: Script was working on 'eth0' network interface when it lost knowledge about the network interface state. The '/etc/network/interfaces.d/' might be desynchronized. Exiting to avoid loss of connectivity, investigate the issue.
  stdout_lines: <omitted>

NO MORE HOSTS LEFT ****************************************************************************************************

PLAY RECAP ************************************************************************************************************
test1                      : ok=25   changed=5    unreachable=0    failed=1    skipped=9    rescued=0    ignored=0

Purge conflicting packages ---------------------------------------------- 4.21s
Prepare configuration of dependent Ansible roles ------------------------ 4.04s
Install custom ifupdown services ---------------------------------------- 2.56s
Gathering Facts --------------------------------------------------------- 2.52s
Generate network interface configuration -------------------------------- 2.16s
Install required packages ----------------------------------------------- 2.11s
Install custom ifupdown hooks ------------------------------------------- 1.95s
Enable custom ifupdown services ----------------------------------------- 1.68s
Remove network interface configuration ---------------------------------- 1.17s
Save ifupdown local facts ----------------------------------------------- 1.15s

Relevant hostvars:

---
ifupdown__interface_layout: static
ifupdown__host_interfaces:
  "eth0":
    inet: static
    allow: hotplug
    address: >
      {{ [ ansible_facts.eth0.ipv4.address, ansible_facts.eth0.ipv4.netmask ] |
      join('/') | ansible.netcommon.ipv4 }}
    gateway: "{{ gateways_prod[0] }}"
    dns_nameservers: "{{ nameservers_prod }}"
    dns_search: "{{ ( [netbase__domain] + [site_domain] ) | unique }}"
    auto: True

Filesystem state after run:

root@test1:~# find /etc/network
/etc/network
/etc/network/interfaces
/etc/network/interfaces.dpkg-divert
/etc/network/if-up.d
/etc/network/if-up.d/000resolvconf
/etc/network/if-up.d/chrony
/etc/network/interfaces.d
/etc/network/if-post-down.d
/etc/network/if-post-down.d/chrony
/etc/network/interfaces.config.d
/etc/network/interfaces.config.d/020_iface_eth0
/etc/network/if-pre-up.d
/etc/network/if-down.d
/etc/network/if-down.d/resolvconf

To attempt to manually apply, I restore the /etc/network/interfaces file, and run with: debops -l test1 --tags "role::ifupdown" --extra-vars "ifupdown__reconfigure_auto=False"

After the run, on the container:

root@test1:~# find /run/network
/run/network
/run/network/debops-ifupdown-reconfigure,020,eth0
/run/network/debops-ifupdown-reconfigure.networking
/run/network/ifstate
/run/network/ifstate.eth0
/run/network/.ifstate.lock
/run/network/ifstate.lo

# I've added 'set -x' at the beginning of the script to get debug output
root@test1:~# /usr/local/lib/ifupdown-reconfigure-interfaces
+ set -o nounset -o pipefail -o errexit
+ pid=10696
++ basename /usr/local/lib/ifupdown-reconfigure-interfaces
+ script=ifupdown-reconfigure-interfaces
+ interface_request_path=/run/network
+ interface_request_prefix=debops-ifupdown-reconfigure
+ interface_reconfigure_all=networking
+ trap on_exit EXIT
+ '[' -d /run/network ']'
+ declare -a changed_interfaces
+ declare -a changed_interface_names
+ declare -a created_interfaces
+ declare -a removed_interfaces
+ mapfile -t changed_interfaces
++ find /run/network -type f -name 'debops-ifupdown-reconfigure,*'
++ sed -e 's#^/run/network/debops-ifupdown-reconfigure,##'
++ sort -n
+ mapfile -t changed_interface_names
++ find /run/network -type f -name 'debops-ifupdown-reconfigure,*'
++ sed -e 's#^/run/network/debops-ifupdown-reconfigure,##'
++ sort -n
++ sed -e 's/^.*\,//'
+ '[' 1 -gt 0 ']'
++ join_by , eth0
++ local IFS=,
++ shift
++ echo eth0
+ log_message -m 'Detected interfaces to reconfigure: eth0'
+ local -A args
+ local OPTIND optchar
+ local optspec=:mflth-:
+ getopts :mflth-: optchar
+ case "${optchar}" in
+ args["message"]='Detected interfaces to reconfigure: eth0'
+ OPTIND=3
+ getopts :mflth-: optchar
+ key_exists args message
+ eval '[ ${args[$2]+test_of_existence} ]'
++ '[' test_of_existence ']'
+ tty -s
+ echo 'Detected interfaces to reconfigure: eth0'
Detected interfaces to reconfigure: eth0
+ type logger
+ logger -t 'ifupdown-reconfigure-interfaces[10696]' 'Detected interfaces to reconfigure: eth0'
+ declare -a systemd_ifup_instances
+ declare -a systemd_iface_instances
+ declare -a systemd_interface_instances
+ is_systemd
+ '[' -d /run/systemd/system ']'
+ return 0
+ mapfile -t systemd_ifup_instances
++ systemctl list-units --no-legend --state=active 'ifup@*.service'
++ awk '{print $1}'
++ sed -e 's/^ifup\@//' -e 's/\.service$//'
+ mapfile -t systemd_iface_instances
++ systemctl list-units --no-legend --state=active 'iface@*.service'
++ awk '{print $1}'
++ sed -e 's/^iface\@//' -e 's/\.service$//'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ systemd_interface_instances=("${systemd_ifup_instances[@]:-}" "${systemd_iface_instances[@]:-}")
+ containsElement networking eth0
+ local e
+ for e in "${@:2}"
+ [[ eth0 == \n\e\t\w\o\r\k\i\n\g ]]
+ return 1
+ (( i=1-1 )) 
+ (( i>=0 ))  
+ iface=020,eth0
++ echo 020,eth0
++ sed -e 's/^.*\,//'
+ iface_name=eth0
+ '[' 020,eth0 '!=' networking ']'
+ '[' changed '!=' created ']'
+ ifquery --state eth0
+ log_message -m 'Bringing down '\''eth0'\'' interface'
+ local -A args
+ local OPTIND optchar
+ local optspec=:mflth-:
+ getopts :mflth-: optchar
+ case "${optchar}" in
+ args["message"]='Bringing down '\''eth0'\'' interface'
+ OPTIND=3
+ getopts :mflth-: optchar
+ key_exists args message
+ eval '[ ${args[$2]+test_of_existence} ]'
++ '[' test_of_existence ']'
+ tty -s
+ echo 'Bringing down '\''eth0'\'' interface'
Bringing down 'eth0' interface
+ type logger 
+ logger -t 'ifupdown-reconfigure-interfaces[10696]' 'Bringing down '\''eth0'\'' interface'
+ interface_down eth0
+ is_systemd  
+ '[' -d /run/systemd/system ']'
+ return 0
+ local -a if_hotplug_interfaces
+ local -a if_boot_interfaces
+ local -a systemd_ifup_instances
+ mapfile -t if_hotplug_interfaces
++ ifquery --list --allow=hotplug
+ mapfile -t if_boot_interfaces
++ ifquery --list --allow=boot
+ mapfile -t systemd_ifup_instances
++ systemctl list-units --no-legend --state=active 'ifup@*.service'
++ awk '{print $1}'
++ sed -e 's/^ifup\@//' -e 's/\.service$//'
+ '[' 0 -gt 0 ']'
+ '[' 0 -gt 0 ']'
+ ifquery --state eth0
+ containsElement eth0 '' ''
+ local e
+ for e in "${@:2}"
+ [[ '' == \e\t\h\0 ]]
+ for e in "${@:2}"
+ [[ '' == \e\t\h\0 ]]
+ return 1
+ log_message -m 'Error: Script was working on '\''eth0'\'' network interface when it lost knowledge about the network interface state. The '\''/etc/network/interfaces.d/'\'' might be desynchronized. Exiting to avoid loss of connectivity, investigate the issue.'
+ local -A args
+ local OPTIND optchar
+ local optspec=:mflth-:
+ getopts :mflth-: optchar
+ case "${optchar}" in
+ args["message"]='Error: Script was working on '\''eth0'\'' network interface when it lost knowledge about the network interface state. The '\''/etc/network/interfaces.d/'\'' might be desynchronized. Exiting to avoid loss of connectivity, investigate the issue.'
+ OPTIND=3
+ getopts :mflth-: optchar
+ key_exists args message
+ eval '[ ${args[$2]+test_of_existence} ]'
++ '[' test_of_existence ']'
+ tty -s
+ echo 'Error: Script was working on '\''eth0'\'' network interface when it lost knowledge about the network interface state. The '\''/etc/network/interfaces.d/'\'' might be desynchronized. Exiting to avoid loss of connectivity, investigate the issue.'
Error: Script was working on 'eth0' network interface when it lost knowledge about the network interface state. The '/etc/network/interfaces.d/' might be desynchronized. Exiting to avoid loss of connectivity, investigate the issue.
+ type logger 
+ logger -t 'ifupdown-reconfigure-interfaces[10696]' 'Error: Script was working on '\''eth0'\'' network interface when it lost knowledge about the network interface state. The '\''/etc/network/interfaces.d/'\'' might be desynchronized. Exiting to avoid loss of connectivity, investigate the issue.'
+ exit 1
+ on_exit
+ rm -f /run/network/debops-ifupdown-reconfigure,020,eth0

This is a systemd-based system:

root@test1:~# ls -d /run/systemd/system
/run/systemd/system
root@test1:~# dpkg -s systemd | grep Version:
Version: 241-7~deb10u8

But there are no ifup@${if}.service units:

root@test1:~# systemctl list-units --no-legend --state=active 'ifup@*.service'
root@test1:~# echo $?
0

So it seems to me that interface_down() will always fail to stop any interfaces.

There is nothing particularly unusual about my install that I can think of which make networking behave radically different. Am I missing something obvious?

Here's some output from the baremetal host:

root@thor:~# ifdown br_mgmt
root@thor:~# systemctl start ifup@br_mgmt.service
A dependency job for ifup@br_mgmt.service failed. See 'journalctl -xe' for details.
root@thor:~# journalctl -xe
Sep 02 22:34:05 thor audit[10842]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/sys/devices/
Sep 02 22:34:05 thor kernel: audit: type=1400 audit(1630614845.624:24283): apparmor="ALLOWED" operation="open" profile=
Sep 02 22:35:58 thor systemd[1]: sys-subsystem-net-devices-br_mgmt.device: Job sys-subsystem-net-devices-br_mgmt.device
Sep 02 22:35:58 thor systemd[1]: Timed out waiting for device /sys/subsystem/net/devices/br_mgmt.
-- Subject: A start job for unit sys-subsystem-net-devices-br_mgmt.device has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit sys-subsystem-net-devices-br_mgmt.device has finished with a failure.
--
-- The job identifier is 13458 and the job result is timeout.
Sep 02 22:35:58 thor systemd[1]: Dependency failed for ifup for br_mgmt.
-- Subject: A start job for unit ifup@br_mgmt.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit ifup@br_mgmt.service has finished with a failure.
--
-- The job identifier is 13455 and the job result is dependency.
Sep 02 22:35:58 thor systemd[1]: ifup@br_mgmt.service: Job ifup@br_mgmt.service/start failed with result 'dependency'.
Sep 02 22:35:58 thor systemd[1]: sys-subsystem-net-devices-br_mgmt.device: Job sys-subsystem-net-devices-br_mgmt.device
Sep 02 22:36:01 thor CRON[23138]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 02 22:36:01 thor CRON[23139]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 22:36:01 thor CRON[23138]: pam_unix(cron:session): session closed for user root
root@thor:~# systemctl list-units --no-legend --state=active 'ifup@*.service'
root@thor:~# ifup br_mgmt

Waiting for br_mgmt to get ready (MAXWAIT is 32 seconds).
root@thor:~# ip a show dev br_mgmt
138: br_mgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
    inet 10.254.10.10/16 brd 10.254.255.255 scope global br_mgmt
       valid_lft forever preferred_lft forever
    inet6 fe80::2eea:7fff:fef9:c164/64 scope link
       valid_lft forever preferred_lft forever
root@thor:~# find /etc/network
/etc/network
/etc/network/if-up.d
/etc/network/if-up.d/chrony
/etc/network/if-up.d/ip
/etc/network/if-up.d/ifenslave
/etc/network/if-up.d/000resolvconf
/etc/network/interfaces
/etc/network/interfaces.config.d
/etc/network/interfaces.d
/etc/network/interfaces.d/020_iface_eno2
/etc/network/interfaces.d/010_iface_bond0
/etc/network/interfaces.d/060_iface_br_local
/etc/network/interfaces.d/060_iface_br_guest
/etc/network/interfaces.d/020_iface_eno1
/etc/network/interfaces.d/020_iface_ens1f1
/etc/network/interfaces.d/020_iface_ens1f0
/etc/network/interfaces.d/060_iface_br_mgmt
/etc/network/interfaces.d/015_iface_bond0.1
/etc/network/interfaces.d/015_iface_bond0.1200
/etc/network/if-down.d
/etc/network/if-down.d/resolvconf
/etc/network/interfaces.dpkg-divert
/etc/network/interfaces.old
/etc/network/if-post-down.d
/etc/network/if-post-down.d/chrony
/etc/network/if-post-down.d/bridge
/etc/network/if-post-down.d/ifenslave
/etc/network/if-post-down.d/vlan
/etc/network/if-pre-up.d
/etc/network/if-pre-up.d/bridge
/etc/network/if-pre-up.d/ifenslave
/etc/network/if-pre-up.d/vlan

Note: I have manually moved the interface configs from interfaces.config.d/ to interfaces.d/ to activate them.

What's so strange about my setup that it breaks the script assumptions?

drybjed commented 3 years ago

Hello, sorry for a late reply. Next time you might want to consider creating an issue in the main debops/debops repository, the separate role repositories are not maintained anymore.

As for your issue - it seems that you either didn't show the full configuration (just eth0 interface), or you have shown details from two different hosts. The generated configuration seems to be for a host with Predictable Network Interface Naming Scheme enabled (eno1 instead of eth0, etc. The ifupdown role should detect such case and use the Ethernet network interfaces that are present on a given host, but without seeing the interface state before and after the configuration is applied it's hard to guess what might have happened to break the script.

So, can you show the ip addr list output before you apply the ifupdown role, as well as your desired configuration for all of the network interfaces that you want to create?

metalwanderer commented 3 years ago

Hi @drybjed - thanks for your reply. I didn't realize this was the wrong place to open an issue... I'll make sure to use the main repo next time.

Sorry for the slightly confusing initial report - let me see if I can be a little clearer. Yes, I did show output from two hosts... ultimately, I'd like to manage the interfaces for all of my hosts (baremetal + container), but in the interest of simplicity, I'll restrict to my baremetal server here:

root@thor:~# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:61 brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.2/16 brd 169.254.255.255 scope global idrac
       valid_lft forever preferred_lft forever
    inet6 fe80::2eea:7fff:fef9:c161/64 scope link
       valid_lft forever preferred_lft forever
3: ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2c:ea:7f:6a:1d:c3 brd ff:ff:ff:ff:ff:ff
4: ens1f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2c:ea:7f:6a:1d:c4 brd ff:ff:ff:ff:ff:ff
5: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
6: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
72: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 scope global lxcbr0
       valid_lft forever preferred_lft forever
148: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
    inet 10.254.10.10/16 brd 10.254.255.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::2eea:7fff:fef9:c164/64 scope link
       valid_lft forever preferred_lft forever
149: bond0.1@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
150: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2c:ea:7f:f9:c1:64 brd ff:ff:ff:ff:ff:ff
    inet 10.100.0.10/16 brd 10.100.255.255 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::2eea:7fff:fef9:c164/64 scope link
       valid_lft forever preferred_lft forever
154: vethVHW4Q0@if153: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP group default qlen 1000
    link/ether fe:ca:6d:9a:bd:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::fcca:6dff:fe9a:bd52/64 scope link
       valid_lft forever preferred_lft forever
156: veth18FPNW@if155: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
    link/ether fe:95:73:99:33:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::fc95:73ff:fe99:33f3/64 scope link
       valid_lft forever preferred_lft forever
158: vethLN0OMY@if157: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP group default qlen 1000
    link/ether fe:dc:1e:7d:87:3c brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::fcdc:1eff:fe7d:873c/64 scope link
       valid_lft forever preferred_lft forever
160: vethJ191X8@if159: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP group default qlen 1000
    link/ether fe:e7:c9:58:2f:92 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::fce7:c9ff:fe58:2f92/64 scope link
       valid_lft forever preferred_lft forever
162: vethB888VF@if161: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br0 state LOWERLAYERDOWN group default qlen 1000
    link/ether fe:bf:39:24:fc:a5 brd ff:ff:ff:ff:ff:ff link-netnsid 3

(bridges have been renamed br_local->br0 and br_mgmt->br1 since first comment)

For the time being, I'm just trying to get debops.ifupdown to manage the previously manual configuration:

---
ifupdown__interface_layout: static
ifupdown__host_interfaces:
  'eno1':
    allow: hotplug

  'eno2':
    allow: hotplug

  'bond0':
    inet: manual
    slaves:
      - eno1
      - eno2
    bond_mode: 802.3ad
    auto: True

  'br0':
    inet: static
    allow: hotplug
    bridge_ports: bond0
    bridge_stp: 'no'
    bridge_fd: 0
    bridge_maxwait: 0
    address: >
      {{ [ ansible_facts.br0.ipv4.address, ansible_facts.br0.ipv4.netmask ] |
      join('/') | ansible.netcommon.ipv4 }}
    gateway: "{{ gateways_prod[0] }}"
    dns_nameservers: "{{ nameservers_prod }}"
    dns_search: "{{ ( [netbase__domain] + [site_domain] ) | unique }}"
    auto: True

  'bond0.1':
    inet: manual
    vlan_device: bond0

  'br1':
    inet: static
    allow: hotplug
    bridge_ports: bond0.1
    bridge_stp: 'no'
    bridge_fd: 0
    bridge_maxwait: 0
    address: >
      {{ [ ansible_facts.br1.ipv4.address, ansible_facts.br1.ipv4.netmask ] |
      join('/') | ansible.netcommon.ipv4 }}
    options: >-
      up ip route add 
      {{ [ ansible_facts.br1.ipv4.network, ansible_facts.br1.ipv4.netmask ] | 
      join('/') | ansible.netcommon.ipv4 }} dev br1 proto kernel scope 
      link src {{ ansible_facts.br1.ipv4.address }} table mgmt

      up ip route add default via {{ gateways_mgmt[0] }} dev br1 table mgmt

      up ip rule add from {{ ansible_facts.br1.ipv4.address }}/32 table mgmt

      up ip rule add to {{ ansible_facts.br1.ipv4.address }}/32 table mgmt
    auto: True

  'bond0.1200':
    inet: manual
    vlan_device: bond0

  'br2':
    inet: manual
    bridge_ports: bond0.1200
    bridge_stp: 'no'
    bridge_fd: 0
    bridge_maxwait: 0

The configuration is correctly generated by ifupdown - no issues there. The configs are placed in /etc/network/interfaces.config.d/, but this is where my issue centers: the task to apply the new configs fails, and the correctly generated config files are not moved to /etc/network/interfaces.d/. I can't get it to work either as part of the playbook run or when manually running /usr/local/lib/ifupdown-reconfigure-interfaces.

Looking at the ifupdown-reconfigure-interfaces script, the failure appears to be here:

# Check what init we are using and bring interface down correctly
interface_down () {

    if is_systemd ; then

        local -a if_hotplug_interfaces
        local -a if_boot_interfaces
        local -a systemd_ifup_instances
        mapfile -t if_hotplug_interfaces < <(ifquery --list --allow=hotplug)
        mapfile -t if_boot_interfaces < <(ifquery --list --allow=boot)
        mapfile -t systemd_ifup_instances < <(systemctl list-units --no-legend --state=active 'ifup@*.service' | awk '{print $1}' | sed -e 's/^ifup\@//' -e 's/\.service$//')

        if [ ${#if_hotplug_interfaces[@]} -gt 0 ] && containsElement "${1}" "${if_hotplug_interfaces[@]}" ; then
            systemctl stop "ifup@${1}.service"
        elif [ ${#if_boot_interfaces[@]} -gt 0 ] && containsElement "${1}" "${if_boot_interfaces[@]}" ; then
            if [ ${#if_hotplug_interfaces[@]} -gt 0 ] && [ ${#systemd_ifup_instances[@]} -gt 0 ] && containsElement "${1}" "${systemd_ifup_instances[@]}" ; then
                systemctl stop "ifup@${1}.service"
            else
                systemctl stop "iface@${1}.service"
            fi
        fi
    else
        ifdown "${1}"
    fi

}

is_systemd evaluates as true, but there are no active ifup@*.service units on the system:

root@thor:~# systemctl list-units --all 'ifup@*.service'
0 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.

So the script always fails when bringing down the interfaces.

Bringing the interfaces up and down with ifup and ifdown works fine, but ifupdown-reconfigure-interfaces will only try that on non-systemd systems. I'm guessing I'm supposed to have those systemd units active, but they're not active.

drybjed commented 3 years ago

If there are not ifup@*.service units, it probably means that this host network is managed by NetworkManager or other service. It would be a good idea to find out what manages them before we mess up further.

How did you install it, via Debian Installer directly, or is it some kind of VPS deployed from a prepared image? With Debian Installer and just a base install I get a host managed with ifupdown with corresponding ifup@.service unit. If the installation method was different (or even if it's say, a Vagrant box), your host might be managed by systemd-networkd. In that case switching to ifupdown might need to be more involved.

We can of course fix this in the ifupdown-reconfigure-interfaces script, but it needs to know how to stop the existing interfaces first correctly. Knowing what manages them will help.

metalwanderer commented 3 years ago

This is a a pretty default install. It's a Dell PowerEdge R7515, installed via the netinst ISO mounted as virtual media over iDRAC.

I installed manually, didn't select any tasks other than web-server and ssh-server. Wouldn't knowingly install NeworkManager on a server, and no netplan either:

root@thor:~# dpkg -s network-manager
dpkg-query: package 'network-manager' is not installed and no information is available
Use dpkg --info (= dpkg-deb --info) to examine archive files.
root@thor:~# dpkg -s netplan.io
dpkg-query: package 'netplan.io' is not installed and no information is available
Use dpkg --info (= dpkg-deb --info) to examine archive files.
root@thor:~# dpkg -s networkd-dispatcher
dpkg-query: package 'networkd-dispatcher' is not installed and no information is available
Use dpkg --info (= dpkg-deb --info) to examine archive files.
root@thor:~# systemctl status systemd-networkd
● systemd-networkd.service - Network Service
   Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:systemd-networkd.service(8)

One of my goals has been to keep the host OS as "stock" as possible, and isolate all application-specific stuff to containers. I don't even install Docker directly on the host, but rather nested within LXC containers.

I have just noticed something interesting, although i don't know what to make of it.

After a fresh reboot:

root@thor:~# ip l | grep UP
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
2: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
5: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
6: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
9: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
10: bond0.1@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP mode DEFAULT group default qlen 1000
11: br_guest: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
12: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
14: vethM6PHY8@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
16: vethXGJQ6S@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP mode DEFAULT group default qlen 1000
18: vethD622GR@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP mode DEFAULT group default qlen 1000
20: vethOIOUNF@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UP mode DEFAULT group default qlen 1000
root@thor:~# systemctl list-units --all 'ifup@*.service'
0 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.

All interfaces are up, but no ifup@ units loaded.

But if I tell systemd to bring up an already-active interface:

root@thor:~# systemctl start 'ifup@bond0'
root@thor:~# echo $?
0
root@thor:~# systemctl list-units --all 'ifup@*.service'
UNIT               LOAD   ACTIVE SUB    DESCRIPTION                                                                   
ifup@bond0.service loaded active exited ifup for bond0

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.

Now systemd is aware of that interface.

So I guess the question becomes: why isn't this happening automatically?

Note: systemd doesn't like something about some of the interfaces - bond0.1 is working, but the manual systemctl probe doesn't result in success:

root@thor:~# systemctl start 'ifup@bond0.1'
root@thor:~# systemctl list-units --all 'ifup@*.service'
  UNIT                 LOAD   ACTIVE SUB    DESCRIPTION                                                               
● ifup@bond0.1.service loaded failed failed ifup for bond0.1
  ifup@bond0.service   loaded active exited ifup for bond0

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.
root@thor:~# systemctl status 'ifup@bond0.1'
● ifup@bond0.1.service - ifup for bond0.1
   Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2021-09-10 15:16:09 CEST; 10min ago
  Process: 7329 ExecStart=/bin/sh -ec ifup --allow=hotplug bond0.1; ifquery --state bond0.1 (code=exited, status=1/FAIL
 Main PID: 7329 (code=exited, status=1/FAILURE)

Sep 10 15:16:09 thor systemd[1]: Started ifup for bond0.1.
Sep 10 15:16:09 thor systemd[1]: ifup@bond0.1.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 15:16:09 thor systemd[1]: ifup@bond0.1.service: Failed with result 'exit-code'.

From journalctl -xe:

Sep 10 15:16:09 thor systemd[1]: Started ifup for bond0.1.
-- Subject: A start job for unit ifup@bond0.1.service has finished successfully
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit ifup@bond0.1.service has finished successfully.
-- 
-- The job identifier is 826.
Sep 10 15:16:09 thor systemd[1]: ifup@bond0.1.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- An ExecStart= process belonging to unit ifup@bond0.1.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 1.
Sep 10 15:16:09 thor systemd[1]: ifup@bond0.1.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit ifup@bond0.1.service has entered the 'failed' state with result 'exit-code'.

Here's the original (pre-debops) /etc/network/interfaces sections for those interfaces:

# Bonding slaves
allow-hotplug eno1
allow-hotplug eno2

# The primary network interface
auto bond0
iface bond0 inet manual
       slaves eno1 eno2
       bond-mode 802.3ad

iface bond0.1 inet manual
       vlan-raw-device bond0

If those ifup@ units activated automatically, then the logic in ifupdown-reconfigure-interfaces would work just fine.