MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.89k stars 498 forks source link

DietPi-Update | SSH connection fails during APT update #6861

Open codemiha opened 10 months ago

codemiha commented 10 months ago

Creating a bug report/issue

Required Information

Additional Information (if applicable)

macbookpro:~ macuser$ ssh 192.168.0.28 -l username
username@192.168.0.28's password:

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
 ─────────────────────────────────────────────────────
 DietPi v8.23.3 : Update available
 ─────────────────────────────────────────────────────
 - Device model : NanoPi NEO Air (armv7l)
 - CPU temp : 29 °C / 84 °F : Who put me in the freezer!
 - LAN IP : 192.168.0.28 (wlan0)
 - MOTD : Stay safe and have a good start into 2024
 ─────────────────────────────────────────────────────

 DietPi Team     : https://github.com/MichaIng/DietPi#the-dietpi-project-team
 Patreon Legends : Chris Gelatt, ADSB.im
 Website         : https://dietpi.com/ | https://twitter.com/DietPi_
 Contribute      : https://dietpi.com/contribute.html
 Web Hosting by  : https://myvirtualserver.com

 dietpi-update   : Run now to update DietPi from v8.23.3 to v8.25.1

 dietpi-launcher : All the DietPi programs in one place
 dietpi-config   : Feature rich configuration tool for your device
 dietpi-software : Select optimised software for installation
 htop            : Resource monitor
 cpu             : Shows CPU information and stats

username@NanopiNeoAir:~$ su -
Password:
 ─────────────────────────────────────────────────────
 DietPi v8.23.3 : Update available
 ─────────────────────────────────────────────────────
 - Device model : NanoPi NEO Air (armv7l)
 - CPU temp : 29 °C / 84 °F : Who put me in the freezer!
 - LAN IP : 192.168.0.28 (wlan0)
 - MOTD : Stay safe and have a good start into 2024
 ─────────────────────────────────────────────────────

 DietPi Team     : https://github.com/MichaIng/DietPi#the-dietpi-project-team
 Patreon Legends : Chris Gelatt, ADSB.im
 Website         : https://dietpi.com/ | https://twitter.com/DietPi_
 Contribute      : https://dietpi.com/contribute.html
 Web Hosting by  : https://myvirtualserver.com

 dietpi-update   : Run now to update DietPi from v8.23.3 to v8.25.1

 dietpi-launcher : All the DietPi programs in one place
NanoPi NEO Air (armv7l) | IP: 192.168.0.28

 dietpi-config   : Feature rich configuration tool for your device
 dietpi-software : Select optimised software for installation
 htop            : Resource monitor
 cpu             : Shows CPU information and stats

root@NanopiNeoAir:~# dietpi-update

 DietPi-Update
─────────────────────────────────────────────────────
 Phase: Checking for available DietPi update

[  OK  ] DietPi-Update | Checking IPv4 network connectivity
[  OK  ] DietPi-Update | Checking DNS resolver
[ INFO ] DietPi-Update | Getting latest version from: https://raw.githubusercontent.com/MichaIng/DietPi/master/.update/version
[  OK  ] DietPi-Update | Got valid latest version: 8.25.1
[  OK  ] DietPi-Update | Update available:
[ INFO ] DietPi-Update | Current version : v8.23.3
[ INFO ] DietPi-Update | Latest version  : v8.25.1

 DietPi-Update
─────────────────────────────────────────────────────
 Phase: Checking for update pre-requirements

[  OK  ] DietPi-Update | DietPi-Userdata validation: /mnt/dietpi_userdata
[  OK  ] DietPi-Update | Free space check: path=/ | available=24471 MiB | required=100 MiB
[  OK  ] DietPi-Update | curl -sSfLO https://raw.githubusercontent.com/MichaIng/DietPi/master/CHANGELOG.txt
[ SUB1 ] DietPi-Services > stop
[  OK  ] DietPi-Services | stop : cron
[  OK  ] DietPi-Services | stop : docker
[  OK  ] DietPi-Services | stop : influxdb
[  OK  ] DietPi-Services | stop : mariadb

 DietPi-Update
─────────────────────────────────────────────────────
 Phase: Applying pre-patches

[  OK  ] DietPi-Update | Downloading pre-patches
[  OK  ] DietPi-Update | Applying execute permission
[  OK  ] DietPi-Update | Successfully applied pre-patches

 DietPi-Update
─────────────────────────────────────────────────────
 Phase: Upgrading APT packages

[ INFO ] DietPi-Update | APT update, please wait...
Get:1 https://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 https://download.docker.com/linux/debian bookworm InRelease [43.3 kB]
Get:3 https://repos.influxdata.com/debian stable InRelease [6901 B]
Get:5 https://repos.influxdata.com/debian stable/main armhf Packages [4201 B]
Get:6 https://deb.debian.org/debian bookworm-updates InRelease [52.1 kB]
Get:7 https://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:8 https://download.docker.com/linux/debian bookworm/stable armhf Packages [13.4 kB]
Get:4 https://stpete-mirror.armbian.com/apt bookworm InRelease [53.3 kB]
Get:9 https://deb.debian.org/debian bookworm-backports InRelease [56.5 kB]
Get:10 https://deb.debian.org/debian bookworm/non-free armhf Packages [55.9 kB]
Get:11 https://deb.debian.org/debian bookworm/main armhf Packages [8497 kB]
Get:12 https://stpete-mirror.armbian.com/apt bookworm/main armhf Packages [132 kB]
Get:13 https://stpete-mirror.armbian.com/apt bookworm/main all Packages [19.8 kB]
client_loop: send disconnect: Broken pipe
macbookpro:~ macuser$ ssh 192.168.0.28 -l username
ssh: connect to host 192.168.0.28 port 22: Operation timed out
macbookpro:~ macuser$

Expected behaviour

OS upgrade to v8.25.1

Actual behaviour

Connection gets disconnected -> upgrade fails

Extra details

Aftermatch: Started manually stopped services. Would it be good idea to run upgrade on screen? Please note that I DON'T have console access.

MichaIng commented 10 months ago

Can you reconnect to SSH after the connection was lost? If so, lets check SSH server logs:

journalctl -u dropbear

The update did not even reach a stage where any packages are actually upgraded. Also, even an upgraded and restarted SSH server won't terminate any SSH connection. So as long as the network connection between client and server is not somehow known to be flaky, this looks more like the system itself crashed, power/voltage issue being the reason in most such cases.

Would it be good idea to run upgrade on screen?

If the network connection is known to be flaky, yes. Another nice alternative is GNU Screen: https://www.gnu.org/software/screen/

apt install screen
screen

So you can re-attach to the shell session of the aborted SSH connection, which just continues to run in background.

If the system itself crashed, then of course this needs to be investigated first.

codemiha commented 10 months ago

-- No entries -- when executed journalctl -u dropbear

Uptime is up 1 day, 1:20 (NanoPi boots via the cron frequently) so the upgrade did not make OS to crash. However, watchdog wrote that "No response from the GW" and tried to boot ( but it failed due the bug..)

So...should I apply renice-command for the upgrade, while it is running on screen? :)

MichaIng commented 10 months ago

Not sure if I understand correctly.

However, watchdog wrote that "No response from the GW" and tried to boot ( but it failed due the bug..)

You see this as kernel error on the NanoPi? And what do you mean with "tried to boot", and what bug?

-- No entries -- when executed journalctl -u dropbear

Did you switch to OpenSSH server? In case:

journalctl -u ssh

Do you see any other kernel errors?

dmesg -l 0,1,2,3

So...should I apply renice-command for the upgrade, while it is running on screen? :)

No, renice is never needed for anything if the system runs stable, and it does not help but can make things worse when the system runs unstable. We need to find the cause for system and/or network to be unstable.

codemiha commented 10 months ago

About the watchdog, systemctl status watchdog showed the mentioned "No response from the GW". From your reply I got an idea: The network IS slow; 0.5Mb DL and 5MB UL. This is "IOT setup" where the needed bandwidth is minimal. Output of "dmesg -l 0,1,2,3" is empty. As we know, the HW-resources with NanoPi Neo Air are limited and from that I got an idea to use renice. Output of "journalctl -u ssh" show only my typos when typing password :D No OOM visible with dmesg -T, with that command only notable output is: [Sun Jan 14 17:03:08 2024] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.friendlyarm,nanopi-neo-air.bin failed with error -2 [Sun Jan 14 17:03:08 2024] brcmfmac mmc1:0001:1: Falling back to sysfs fallback for: brcm/brcmfmac43430-sdio.friendlyarm,nanopi-neo-air.bin

MichaIng commented 10 months ago

About the watchdog, systemctl status watchdog showed the mentioned "No response from the GW"

Ah okay. Then I am actually quite sure that it is a network issue, not so much the WAN bandwidth, but the connection between SSH client and NanoPi as well NanoPi and gateway (the watchdog error), hence LAN-internal. Not sure about the quality/range of the onboard WiFi adapter of the NanoPi NEO Air. Probably you can put it to a better position or attach some antenna for a better/stable signal.

As we know, the HW-resources with NanoPi Neo Air are limited and from that I got an idea to use renice.

There are however much more important processes than a foreground system setup, like init system, system logging, udev and stuff like that, and of course the SSH server you are connecting through. Shifting resources via nice/priority to the foreground shell/script could cause more issues. In this case, as can be seen from missing kernel and SSH server errors, the issue is most likely the WiFi connection, hence the nice/priority of the foreground process has no effect on this. When system resources are exhausted, such foreground setup/install processes can run slower, but they should never become unstable, unless there are other/indirect issues like voltage or temperature. Renice can be reasonable when you have time-critical/RT processes or high quality audio processing. Other than that, it can make sense to lower the priority background/cron jobs if you feel that they disturb other processes.

codemiha commented 10 months ago

The upgrade completed. The trick was to utilise screen and running dietpi-upgrade there. I also disabled and stopped watchdog. The upgrade took almost 2h to complete (due the 0.5MB downlink). NanoPi NEO Air wifi signal strength is excellent, the device is located about 50cm from the 4G-router and there is nothing between the router and NanoPi Neo Air to block signal. Thank you :)

MichaIng commented 10 months ago

Do you connect remotely (via Internet) with your SSH client? Probably this "No response from the GW" error can also show up when the router/gateway does take too long get a request. SSH is pretty cheap on bandwidth, but probably when doing APT updates/upgrades, it is exhausted, breaking SSH as well.

I was looking for a way to prioritise SSH via something like QoS, and indeed there is a way: https://debian-handbook.info/browse/stable/sect.quality-of-service.html

Would be interesting whether this helps:

apt install wondershaper iptables
wondershaper wlan0 4000 40000
iptables -t mangle -A PREROUTING -p tcp --sport 22 -j DSCP --set-dscp 4
iptables -t mangle -A PREROUTING -p tcp --dport 22 -j DSCP --set-dscp 4

If this indeed helps to keep up the SSH connection during APT upgrades/downloads etc, these can be added to network configs to be applied automatically at boot/when the network is brought up.

codemiha commented 9 months ago

FYI: ───────────────────────────────────────────────────── DietPi v9.0.2 : 14:53 - Wed 01/24/24 ─────────────────────────────────────────────────────

Successfully upgraded to v9.0.2 :)

MichaIng commented 9 months ago

Did you apply the suggested bandwidth sharpening and/or DSCP bits, or did it finally work without those?

codemiha commented 9 months ago

Hi. I logged in via reverse-ssh and the dietpi-update was executed on screen.

MichaIng commented 9 months ago

EDIT: Ah whoops, I mixed up the issues. Would be still good to know whether the above steps help to keep a non-reverse SSH session active.