cockpit-project / cockpit

Cockpit is a web-based graphical interface for servers.
http://www.cockpit-project.org/
GNU Lesser General Public License v2.1
10.94k stars 1.09k forks source link

Ubuntu 20.04: Installing network-manager breaks networkd interface #15972

Open MaximKing1 opened 3 years ago

MaximKing1 commented 3 years ago

I've installed cockpit on my Ubuntu 20.04 machine and after I restarted its refusing to come online. It's all plugged in checked with the datacenter just won't show online... What does Cockpit change as now I will have to drive over to the datacenter and get into it that way.

All I done was apt-get install cockpit then I typed Y to install the other addons which were needed, it said I needed to restart so I done sudo reboot and it won't start, I've never had this issue before. Does Cockpit change any Networking Settings which is causing the machine to not come online?

KKoukiou commented 3 years ago

Hey @MaximKing1, cockpit does not do change any configuration or any kind of magic that might render a server unbootable. Maybe you had some irrelevant package update took effect after the reboot resulting in the issue you see. In any case once you have more information by checking the journal, can you update the issue with details?

garrett commented 3 years ago

Dependencies

Cockpit pulls in cockpit-network which pulls in network-manager. From a fresh install of Ubuntu 20.04 Server that I upgraded before installing cockpit:

$ sudo apt install cockpit
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  cockpit-bridge cockpit-dashboard cockpit-networkmanager cockpit-packagekit
  cockpit-storaged cockpit-system cockpit-ws cracklib-runtime dns-root-data
  dnsmasq-base libatasmart4 libblockdev-crypto2 libblockdev-fs2
  libblockdev-loop2 libblockdev-mdraid2 libblockdev-part-err2
  libblockdev-part2 libblockdev-swap2 libblockdev-utils2 libblockdev2
  libbluetooth3 libbytesize1 libcrack2 libidn11 libjansson4 libmbim-glib4
  libmbim-proxy libmm-glib0 libndp0 libnl-route-3-200 libnm0 libnspr4 libnss3
  libparted-fs-resize0 libpcsclite1 libpwquality-common libpwquality-tools
  libpwquality1 libqmi-glib5 libqmi-proxy libteamdctl0 libudisks2-0
  libvolume-key1 modemmanager network-manager network-manager-pptp ppp
  pptp-linux udisks2 usb-modeswitch usb-modeswitch-data wamerican
  wpasupplicant
Suggested packages:
  cockpit-doc cockpit-pcp cockpit-machines xdg-utils sssd-dbus libparted-dev
  pcscd avahi-autoipd libteam-utils exfat-utils f2fs-tools nilfs-tools
  reiserfsprogs udftools udisks2-bcache udisks2-btrfs udisks2-lvm2 udisks2-vdo
  udisks2-zram comgt wvdial wpagui libengine-pkcs11-openssl
The following NEW packages will be installed:
  cockpit cockpit-bridge cockpit-dashboard cockpit-networkmanager
  cockpit-packagekit cockpit-storaged cockpit-system cockpit-ws
  cracklib-runtime dns-root-data dnsmasq-base libatasmart4 libblockdev-crypto2
  libblockdev-fs2 libblockdev-loop2 libblockdev-mdraid2 libblockdev-part-err2
  libblockdev-part2 libblockdev-swap2 libblockdev-utils2 libblockdev2
  libbluetooth3 libbytesize1 libcrack2 libidn11 libjansson4 libmbim-glib4
  libmbim-proxy libmm-glib0 libndp0 libnl-route-3-200 libnm0 libnspr4 libnss3
  libparted-fs-resize0 libpcsclite1 libpwquality-common libpwquality-tools
  libpwquality1 libqmi-glib5 libqmi-proxy libteamdctl0 libudisks2-0
  libvolume-key1 modemmanager network-manager network-manager-pptp ppp
  pptp-linux udisks2 usb-modeswitch usb-modeswitch-data wamerican
  wpasupplicant
0 upgraded, 54 newly installed, 0 to remove and 0 not upgraded.
Need to get 13.8 MB of archives.
After this operation, 41.1 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Installing Cockpit pulls in these packages:

cockpit
cockpit-bridge
cockpit-dashboard
cockpit-doc
cockpit-machines
cockpit-networkmanager
cockpit-packagekit
cockpit-pcp
cockpit-storaged
cockpit-system
cockpit-ws

Explanation

Ubuntu 20.04 LTS Server does not use NetworkManager by default. If you set up the networking without using NetworkManager and then install NetworkManager (which enables and then starts it on a reboot), then your network interface may change. I think this is what has happened.

Ubuntu 20.04 LTS Workstation, however does use NetworkManager to manage the network interface.

Workarounds

Smaller set of packages

You could, as a workaround, install individual packages:

Minimal install:

sudo apt install cockpit-ws cockpit-system

Everything except networking:

sudo apt install cockpit-ws cockpit-system cockpit-machines cockpit-packagekit cockpit-storaged cockpit-pcp

Normal install and disable NetworkManager

Or you could install cockpit in the same way and forcefully disable NetworkManager with a mask:

sudo apt update && apt install cockpit
sudo systemctl mask --now NetworkManager.service

Masking a service forbids systemd from running it. This allows you to use networking without NetworkManager, even if it is installed.

Solution

I think we should probably document one (or both?) of these methods on the installation part of the website.

garrett commented 3 years ago

BTW, here's what happens when you want to install Cockpit on an Ubuntu 20.04 LTS Workstation installation:

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  cockpit-bridge cockpit-dashboard cockpit-networkmanager cockpit-packagekit
  cockpit-storaged cockpit-system cockpit-ws finalrd libblockdev-mdraid2
  libbytesize1 libpwquality-tools mdadm
Suggested packages:
  cockpit-doc cockpit-pcp cockpit-machines sssd-dbus default-mta
  | mail-transport-agent dracut-core
The following NEW packages will be installed:
  cockpit cockpit-bridge cockpit-dashboard cockpit-networkmanager
  cockpit-packagekit cockpit-storaged cockpit-system cockpit-ws finalrd
  libblockdev-mdraid2 libbytesize1 libpwquality-tools mdadm
0 upgraded, 13 newly installed, 0 to remove and 0 not upgraded.
Need to get 5.750 kB of archives.
After this operation, 8.753 kB of additional disk space will be used.
Do you want to continue? [Y/n]

Note how a lot of dependencies are already there, such as NetworkManager? Of course, Cockpit's intended for servers, not workstations, so installing it on this flavor doesn't make as much sense.

garrett commented 3 years ago

@KKoukiou, @martinpitt: What do you think about the workarounds? Do you think masking NetworkManager on an Ubuntu (and possibly Debian) server makes sense, if someone doesn't want to use NetworkManager?

martinpitt commented 3 years ago

Merely installing network-manager normally doesn't change the network config -- Ubuntu uses netplan which by default manages devices with systemd-networkd on servers, as Garrett already mentioned; see /etc/netplan/*.yaml. It's of course possible to change this. So to investigate this further, we'd need the output of nmcli d, nmcli c, and /etc/netplan/*.yaml, and journalctl -b would also be helpful.

Nevertheless, that is certainly a plausible culprit, and systemctl disable --now network-manager is a good first thing to try. If that's not it, then there might have been a pending update or config change which just got effective due to the reboot. "apt-get install cockpit" requesting you to restart the machine actually makes me rather suspicious -- merely installing cockpit doesn't do that. This usually happens when there is a pending kernel update, which may also explain the boot failure. Without further logs I'm afraid there is not much more that we can diagnose at this point.

@MaximKing1 : Check if your provider offers a serial console window -- pretty much all colo providers that I've used in the past do, and all public cloud providers have this anyway. Then you can watch it boot, and log into a VT without being dependent on the network.

@garrett: Let's please not document a systemctl mask command -- a mere disable is sufficient.

garrett commented 3 years ago

@garrett: Let's please not document a systemctl mask command -- a mere disable is sufficient.

Whoops! OK. I was researching this and saw someone say that a disable didn't work and they suggested using a mask instead. I guess they were wrong. I wondered about that too. Good point.

garrett commented 3 years ago

@MaximKing1: Can you please provide the output of the following?

martinpitt commented 3 years ago

Any updates here? Right now I'm afraid this isn't actionable.

garrett commented 3 years ago

Pinging @MaximKing1 for a response on how Cockpit's working on their machine. (We're waiting on a response to see what's up and if there's a bug we need to fix.)

CaptainMidnight commented 3 years ago

Hi All

I can definitely add some input here, although installed on a technicality different system - Debian based Raspberry Pi4B, exactly the same effect has been experienced.

Installing Cockpit went fine, pulled in all it's dependencies and was working fine. But on 1st reboot after install, Pi4B no longer visible on the network with it's original network configuration - no longer responds to ping, ssh etc but otherwise appears to be working - not good for a headless test unit.

By configuration I don't use the standard network configuration method for a Pi, I use systemd-networkd and reading the above am I right to assume the install of network-manager is the likely culprit here?

Similar issues can be seen reported in comments section here Opensource comments

CaptainMidnight commented 3 years ago

As an update, I now understand just exactly what is going on with this issue when network-manager is enabled in the install of Cockpit.

When the network-manager is disabled as previous comments above, everything is working as expected, obviously except the Networking tab in Cockpit - as expected..... rebooting the Pi4B holds no issue on reboot and it's IP is configured and accessible as per original configuration.

When the network-manager is enabled on a system that is currently configured via systemd-networkd the system on reboot reverts it's network configuration to DHCP. After finding this out, sorting this out was as follows: -

  1. Connect to Cockpit via the now new DHCP address.
  2. Disable DHCP configuration on the interface and configure the static IP address to be as per DHCP assigned.
  3. Create an additional IP interface, configured to the previous static IP address and apply it.
  4. Manually configure the DNS settings and apply it.
  5. Logout and re-login to Cockpit via the statically configured IP address setup by (3) i.e. the one you really want.
  6. Delete the 1st static IP address in (2) above, to leave only the wanted live IP address.

I'm sure there's a file I could of pre-configured to remove this automatic DHCP enabled configuration, but without any warning that this would happen, I did't have that option until diagnosing what was going on in a headless environment.

CaptainMidnight commented 3 years ago

Subsequent installs on additional headless Pi4B units have been successful, if after initial install and before the 1st reboot if I accessed the Networking section via Cockpit and applied the currenlty as expected displayed static configuration.

No further issues with Cockpit installs reverting to DHCP configuration.

martinpitt commented 3 years ago

I tested this on an Ubuntu 20.04 cloud VM (should be very close to server). Its primary network config is through eth0, managed by netplan/systemd-networkd:

# networkctl
IDX LINK       TYPE     OPERATIONAL SETUP     
  1 lo         loopback carrier     unmanaged 
  2 eth0       ether    routable    configured
  3 eth1       ether    off         unmanaged 
  4 virbr0     bridge   no-carrier  unmanaged 
  5 virbr0-nic ether    off         unmanaged 

# networkctl status eth0
● 2: eth0                                                              
             Link File: /run/systemd/network/10-netplan-eth0.link      
          Network File: /run/systemd/network/10-netplan-eth0.network   
                  Type: ether                                          
                 State: routable (configured)
                  Path: pci-0000:00:0e.0                               
                Driver: virtio_net                                     
                Vendor: Red Hat, Inc.                                  
                 Model: Virtio network device                          
            HW Address: 52:54:00:12:34:56                              
                   MTU: 1500 (min: 68, max: 65535)                     
  Queue Length (Tx/Rx): 1/1                                            
      Auto negotiation: no                                             
                 Speed: n/a                                            
               Address: 172.27.0.15 (DHCP4)                            
                        fec0::5054:ff:fe12:3456                        
                        fe80::5054:ff:fe12:3456                        
               Gateway: 172.27.0.2                                     
                        fe80::2                                        
                   DNS: 172.27.0.3       

Now I did apt install network-manager. This automatically starts systemctl status NetworkManager, but it correctly recognizes that it does not manage eth0, and leaves it alone:

# nmcli d
DEVICE      TYPE      STATE         CONNECTION 
virbr0      bridge    connected     virbr0     
eth1        ethernet  disconnected  --         
eth0        ethernet  unmanaged     --         
lo          loopback  unmanaged     --         
virbr0-nic  tun       unmanaged     --         

So I'm still afraid I need more information how to reproduce this: How does /etc/netplan/*, networkctl and networkctl status YOURETHDEVICE look like in the original working configuration? How does it change after installing network-manager, and what do nmcli c and nmcli d show?

Thanks!

CaptainMidnight commented 3 years ago

@martinpitt hi my issue that I came across is fixed if after installing Cockpit, the valid IP config which Cockpit displays on 1st run is saved via Cockpit before the 1st reboot after Cockpit installed.

To me, reading this initial issue I would of thought @MaximKing1 maybe incurring something similar i.e original IP configuration switches to DHCP/changes, hence they loose remote access/appears the unit is no longer online on the original configured IP address - it could be completely different but I was surprised when that had happened to the 3x Pi4B units I have. They all run PiOS 64bit and only have networking configured via systemd-networkd.

I'm currently working away until next Tuesday (17th), but will recheck wrt PiOS 64bit installation of Cockpit on a fresh backup image to see if any further investigation can be identified.

I must confirm, my PiOS Pi4B network configuration setup using systemd-networkd is not standard for a Pi which by default which I think use network-manager or something else. I'll confirm then when I'm back in the lab - so my installation experience using Cockpit will probably not be typical for a Pi installation.

martinpitt commented 3 years ago

@CaptainMidnight : Thanks for your follow-up! So you are saying in your case you actually changed something in cockpit's Network page to break networking? The original issue description did not mention that, it just said "install" -- and the only plausible part of that is that installing the network-manager package somehow broke the networkd config.

eagerestwolf commented 2 years ago

Acutally, the problem here lies with Netplan, Ubiquity, and cloud_init. When the Ubiquity installer invokes cloud_init to configure Netplan, it sets a sneaky little line near the top of /etc/netplan/00-cloud-init.yml: renderer: networkd. That is the problem, because at that point, any other network configuration utility will conflict with networkd preventing it from working. The only solution I've found to work and continue to allow me to configure the network (or at least monitor it) using Cockpit, is to change that line to renderer: NetworkManager. That said, maybe in the future the Cockpit Project could potentially provide support to netplan and/or networkd, but given the project has so much going on at the moment, I think that's probably gonna be left on the back burner for now.

martinpitt commented 2 years ago

@eagerestwolf : We thought about it, but it's annoyingly difficult. netplan is much more like configuration management than a dynamic API to talk to, and even networkd does not have a "proper" API. There were discussions upstream to add a D-Bus API, but it was never done. This is conceptually incompatible with cockpit always showing the current state and reacting to changes. This isn't to say that it's impossible of course, and the Networking page should at least do something more sensible in that case.

As to the bug here, that still doesn't clarify it for me -- setting renderer: networkd is what I had expected all along, as that's the default for Ubuntu server. In that case, installing NM should be a no-op, as it won't manage any current interface at least (depending on the interface matches in 00-cloud-init.yml, but I figure it's either a * or the primary ethernet only).

eagerestwolf commented 2 years ago

Agreed, Netplan is great in concept from a user/administrator standpoint, but from a programming standpoint...it leaves quite a bit to be desired. I can understand not wanting to reinvent the wheel and there's only so much you can do when every major distro uses its own things. The Linux space is so polluted with different implementations of the same thing (look no further than package managers), it's difficult to create a one size fits all tool. For the time being though, this bug is definitely on Ubuntu's end, though I believe it was addressed in 21.04, although how I don't know for sure. For the time being, for whatever reason, installing NetworkManager on Ubuntu Server breaks networking, but it only seems to affect people with non-standard configurations (i.e. static IP or bonded connections).

eagerestwolf commented 2 years ago

As for the default configuration created by cloud_init, it will always, as far as I remember, create a separate configuration item for each network interface, but usually it just sets DHCP4 and DHCP6, unless you make modifications. I'd be willing to wager that the underlying issue was something to do with Ubuntu's implementation of networkd. Seeing as networkd was created by RedHat who takes the Linux standards very seriously and adapted by Ubuntu who isn't as serious about standards because usability.

PabloGrok commented 2 years ago

I solved this by using this configuration of my /etc/netplan/*.yaml file:

# This is the network config written by 'subiquity'
# Added NetworkManager renderer to allow for cockpit network management.
network:
  renderer: NetworkManager
  ethernets:
    eno1:
      dhcp4: true
    eno2:
      dhcp4: true
    ens6:
      dhcp4: true
    ens6d1:
      dhcp4: true
  version: 2

I just added the renderer: NetworkManager line to the file generated by subiquity.

On my Ubuntu 20.04.3 deafult install there was an error on every boot where the system would hang for about three minutes waiting for network configuration (systemd-networkd-wait-online). Updating the above file also solved this issue. It looks like Ubuntu 20.04.3 is making a mess between systemd-networkd and NetworkManager.

In case you still have systemd-networkd doing stuff and breaking things, you can also issue these commands:

sudo systemctl stop systemd-networkd.service
sudo systemctl disable systemd-networkd.service
sudo systemctl mask systemd-networkd.service
sudo systemctl unmask NetworkManager
sudo systemctl enable NetworkManager
sudo systemctl start NetworkManager

This will fully replace systemd-networkd.service with NetworkManager. Cockpit will work fine, the network will work fine and there'll be no three-minute wait during boot.

Hope this helps.

JustinBenedick commented 2 years ago

Just confirmed this issue also affects rasbian-lite buster.

alexxroche commented 2 years ago

Just confirmed this issue also affects rasbian-lite buster.

I ran sudo apt install cockpit on Raspbian GNU/Linux 10 and lost connection to my server. I managed to get back in over the VPN and found that it had lost IPv6 entirely. Took me a while to work out that cockpit had installed network-manager and that had broken my network interfaces.

After sudo apt remove cockpit; sudo apt purge -y cockpit; sudo reboot wpa_supplicant + dhcp are happy again and all of my services have returned.

I'm still sure that I do want to use cockpit, but I'm going to have to build it for my platform requirements. (The cockpit team seem to be doing good work, which is why I bothered to mention the issue that I ran into.)

BAProductions commented 1 year ago

Set the DNS to 1.1.1.1 & it will work NP

emaayan commented 1 month ago

I solved this by using this configuration of my /etc/netplan/*.yaml file:

# This is the network config written by 'subiquity'
# Added NetworkManager renderer to allow for cockpit network management.
network:
  renderer: NetworkManager
  ethernets:
    eno1:
      dhcp4: true
    eno2:
      dhcp4: true
    ens6:
      dhcp4: true
    ens6d1:
      dhcp4: true
  version: 2

I just added the renderer: NetworkManager line to the file generated by subiquity.

On my Ubuntu 20.04.3 deafult install there was an error on every boot where the system would hang for about three minutes waiting for network configuration (systemd-networkd-wait-online). Updating the above file also solved this issue. It looks like Ubuntu 20.04.3 is making a mess between systemd-networkd and NetworkManager.

In case you still have systemd-networkd doing stuff and breaking things, you can also issue these commands:

sudo systemctl stop systemd-networkd.service
sudo systemctl disable systemd-networkd.service
sudo systemctl mask systemd-networkd.service
sudo systemctl unmask NetworkManager
sudo systemctl enable NetworkManager
sudo systemctl start NetworkManager

This will fully replace systemd-networkd.service with NetworkManager. Cockpit will work fine, the network will work fine and there'll be no three-minute wait during boot.

Hope this helps.

why did you have to specifically set dhcp true for all those interfaces? isn't that the default already?