Open codemiha opened 10 months ago
Can you reconnect to SSH after the connection was lost? If so, lets check SSH server logs:
journalctl -u dropbear
The update did not even reach a stage where any packages are actually upgraded. Also, even an upgraded and restarted SSH server won't terminate any SSH connection. So as long as the network connection between client and server is not somehow known to be flaky, this looks more like the system itself crashed, power/voltage issue being the reason in most such cases.
Would it be good idea to run upgrade on screen?
If the network connection is known to be flaky, yes. Another nice alternative is GNU Screen: https://www.gnu.org/software/screen/
apt install screen
screen
So you can re-attach to the shell session of the aborted SSH connection, which just continues to run in background.
If the system itself crashed, then of course this needs to be investigated first.
-- No entries -- when executed journalctl -u dropbear
Uptime is up 1 day, 1:20 (NanoPi boots via the cron frequently) so the upgrade did not make OS to crash. However, watchdog wrote that "No response from the GW" and tried to boot ( but it failed due the bug..)
So...should I apply renice-command for the upgrade, while it is running on screen? :)
Not sure if I understand correctly.
However, watchdog wrote that "No response from the GW" and tried to boot ( but it failed due the bug..)
You see this as kernel error on the NanoPi? And what do you mean with "tried to boot", and what bug?
-- No entries -- when executed journalctl -u dropbear
Did you switch to OpenSSH server? In case:
journalctl -u ssh
Do you see any other kernel errors?
dmesg -l 0,1,2,3
So...should I apply renice-command for the upgrade, while it is running on screen? :)
No, renice is never needed for anything if the system runs stable, and it does not help but can make things worse when the system runs unstable. We need to find the cause for system and/or network to be unstable.
About the watchdog, systemctl status watchdog showed the mentioned "No response from the GW". From your reply I got an idea: The network IS slow; 0.5Mb DL and 5MB UL. This is "IOT setup" where the needed bandwidth is minimal. Output of "dmesg -l 0,1,2,3" is empty. As we know, the HW-resources with NanoPi Neo Air are limited and from that I got an idea to use renice. Output of "journalctl -u ssh" show only my typos when typing password :D No OOM visible with dmesg -T, with that command only notable output is: [Sun Jan 14 17:03:08 2024] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.friendlyarm,nanopi-neo-air.bin failed with error -2 [Sun Jan 14 17:03:08 2024] brcmfmac mmc1:0001:1: Falling back to sysfs fallback for: brcm/brcmfmac43430-sdio.friendlyarm,nanopi-neo-air.bin
About the watchdog, systemctl status watchdog showed the mentioned "No response from the GW"
Ah okay. Then I am actually quite sure that it is a network issue, not so much the WAN bandwidth, but the connection between SSH client and NanoPi as well NanoPi and gateway (the watchdog error), hence LAN-internal. Not sure about the quality/range of the onboard WiFi adapter of the NanoPi NEO Air. Probably you can put it to a better position or attach some antenna for a better/stable signal.
As we know, the HW-resources with NanoPi Neo Air are limited and from that I got an idea to use renice.
There are however much more important processes than a foreground system setup, like init system, system logging, udev and stuff like that, and of course the SSH server you are connecting through. Shifting resources via nice/priority to the foreground shell/script could cause more issues. In this case, as can be seen from missing kernel and SSH server errors, the issue is most likely the WiFi connection, hence the nice/priority of the foreground process has no effect on this. When system resources are exhausted, such foreground setup/install processes can run slower, but they should never become unstable, unless there are other/indirect issues like voltage or temperature. Renice can be reasonable when you have time-critical/RT processes or high quality audio processing. Other than that, it can make sense to lower the priority background/cron jobs if you feel that they disturb other processes.
The upgrade completed. The trick was to utilise screen and running dietpi-upgrade there. I also disabled and stopped watchdog. The upgrade took almost 2h to complete (due the 0.5MB downlink). NanoPi NEO Air wifi signal strength is excellent, the device is located about 50cm from the 4G-router and there is nothing between the router and NanoPi Neo Air to block signal. Thank you :)
Do you connect remotely (via Internet) with your SSH client? Probably this "No response from the GW" error can also show up when the router/gateway does take too long get a request. SSH is pretty cheap on bandwidth, but probably when doing APT updates/upgrades, it is exhausted, breaking SSH as well.
I was looking for a way to prioritise SSH via something like QoS, and indeed there is a way: https://debian-handbook.info/browse/stable/sect.quality-of-service.html
Would be interesting whether this helps:
apt install wondershaper iptables
wondershaper wlan0 4000 40000
iptables -t mangle -A PREROUTING -p tcp --sport 22 -j DSCP --set-dscp 4
iptables -t mangle -A PREROUTING -p tcp --dport 22 -j DSCP --set-dscp 4
wondershaper
, this limits traffic to 4000 kbps download and 40,000 kbps upload, a little below your theoretical bandwidth? In case adjust those values to be a little below the available bandwidth. And I took DL indeed as "download to the server" and UL as "upload from the server", while usually the download bandwidth is larger than the upload bandwidth and you mean this from client side? In case swap the two values: https://manpages.debian.org/wondershaperiptables
rules (I hope I translated them correctly from the nftables
rules of the above guide), we set the DSCP bit for SSH packets for low latency/high reliability/real-time interactive applications: https://en.wikipedia.org/wiki/Differentiated_services#Class_Selector, https://linuxreviews.org/Type_of_Service_(ToS)_and_DSCP_Values#Iptables_.26_ToS_.26_DSCP_ValuesIf this indeed helps to keep up the SSH connection during APT upgrades/downloads etc, these can be added to network configs to be applied automatically at boot/when the network is brought up.
FYI: ───────────────────────────────────────────────────── DietPi v9.0.2 : 14:53 - Wed 01/24/24 ─────────────────────────────────────────────────────
Successfully upgraded to v9.0.2 :)
Did you apply the suggested bandwidth sharpening and/or DSCP bits, or did it finally work without those?
Hi. I logged in via reverse-ssh and the dietpi-update was executed on screen.
EDIT: Ah whoops, I mixed up the issues. Would be still good to know whether the above steps help to keep a non-reverse SSH session active.
Creating a bug report/issue
Required Information
DietPi version G_DIETPI_VERSION_CORE=8 G_DIETPI_VERSION_SUB=23 G_DIETPI_VERSION_RC=3 G_GITBRANCH='master' G_GITOWNER='MichaIng' G_LIVE_PATCH_STATUS[0]='applied' G_LIVE_PATCH_STATUS[1]='not applied' G_LIVE_PATCH_STATUS[2]='not applicable'
Distro version: bookworm
Kernel version:
Linux NanoPi Neo Air 6.1.53-current-sunxi #3 SMP Wed Sep 13 07:43:05 UTC 2023 armv7l GNU/Linux
SBC model: NanoPi NEO Air (armv7l)
Power supply used: 5V 1.5A
SD card used: EMMC
Additional Information (if applicable)
Expected behaviour
OS upgrade to v8.25.1
Actual behaviour
Connection gets disconnected -> upgrade fails
Extra details
Aftermatch: Started manually stopped services. Would it be good idea to run upgrade on screen? Please note that I DON'T have console access.