RPi-Distro / repo

Issue tracking for the archive.raspberrypi.org repo
37 stars 1 forks source link

watchdog does not detect traffic on wlan0 #237

Closed sandrobordacchini closed 1 year ago

sandrobordacchini commented 3 years ago

Hi. I just realized that if i enable hw watchdog on my Raspberry PI Zero W (with raspbian 10) and configure watchdog to monitor wlan0 interface, i got errors about it being not able to detect any traffic on wlan0: device wlan0 did not receive anything since last check but checking /proc/net/dev shows traffic...

I performed some tests that you can see in the attached file. raspbian10-watchdog-wlan.txt

julien-vancouver commented 3 years ago

Just to confirm - I have the same issue enabling a watchdog for wlan0 on an rPiW0

Jul 21 15:20:37 raspberrypi watchdog[434]: device wlan0 did not receive anything since last check pi@raspberrypi:/lib/firmware/brcm $ uname -a Linux raspberrypi 5.10.17+ #1421 Thu May 27 13:58:02 BST 2021 armv6l GNU/Linux

jerabaul29 commented 2 years ago

Same issue.

XECDesign commented 2 years ago

Is this also happening on Bullseye?

Just tested on a pi 4 and it seems okay. I'll see if I can reproduce it on a zero tomorrow.

jerabaul29 commented 2 years ago

Tested on Raspbian from my side, on a RPi 4.

XECDesign commented 2 years ago

Raspbian Bullseye or Buster (Legacy/oldstable)?

Just tried on a Zero W (verbose = 1):

$ sudo journalctl -u watchdog -b0 -f
-- Journal begins at Wed 2022-01-19 01:29:43 GMT. --
Feb 02 16:01:55 serge-testpi watchdog[418]: still alive after 49 interval(s)
Feb 02 16:01:55 serge-testpi watchdog[418]: device wlan0 received 32816 bytes
Feb 02 16:01:56 serge-testpi watchdog[418]: still alive after 50 interval(s)
Feb 02 16:01:56 serge-testpi watchdog[418]: device wlan0 received 33298 bytes
Feb 02 16:01:57 serge-testpi watchdog[418]: still alive after 51 interval(s)
Feb 02 16:01:57 serge-testpi watchdog[418]: device wlan0 received 33376 bytes
Feb 02 16:01:58 serge-testpi watchdog[418]: still alive after 52 interval(s)
Feb 02 16:01:58 serge-testpi watchdog[418]: device wlan0 received 37940 bytes
Feb 02 16:01:59 serge-testpi watchdog[418]: still alive after 53 interval(s)
Feb 02 16:01:59 serge-testpi watchdog[418]: device wlan0 received 46692 bytes
Feb 02 16:02:00 serge-testpi watchdog[418]: still alive after 54 interval(s)
Feb 02 16:02:00 serge-testpi watchdog[418]: device wlan0 received 50904 bytes
Feb 02 16:02:01 serge-testpi watchdog[418]: still alive after 55 interval(s)
Feb 02 16:02:01 serge-testpi watchdog[418]: device wlan0 received 55702 bytes
Feb 02 16:02:02 serge-testpi watchdog[418]: still alive after 56 interval(s)
Feb 02 16:02:02 serge-testpi watchdog[418]: device wlan0 received 57970 bytes
jerabaul29 commented 2 years ago

I got the problem when I installed it a few days ago, I think it was the same version that is still available in rpi-imager, i.e. Raspberry Pi OS (sorry, they changed the name from Raspbian, my bad), port of Debian Bullseye, released 2022-01-28 :) .

jerabaul29 commented 2 years ago

Is it possible that I / we did a mistake on setting this up? How did you set this up / how does your config exactly looks like when getting it to work on the RPi 4? :)

XECDesign commented 2 years ago

I didn't do anything special. Just connected to the access point, ssh'ed in, installed 'watchdog', added the interface and verbose lines to the config file then rebooted.

no-response[bot] commented 2 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

stephen-mw commented 2 years ago

Still in issue as of the latest RPI image with at least my raspberry pi zero w

dpkg -l watchdog
ii  watchdog       5.15-2       armhf        system health checker and software/hardware watchdog handler
System Information
------------------

Raspberry Pi Zero W Rev 1.1
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

Raspberry Pi reference 2021-12-02
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, fa45ccf5a4b183ee566b36d74fb4b65bf9358bed, stage2

Linux raspberrypi 5.10.103+ #1529 Tue Mar 8 12:19:18 GMT 2022 armv6l GNU/Linux
Revision    : 9000c1
Serial      : 0000000027753e96
Model       : Raspberry Pi Zero W Rev 1.1

cat /etc/watchdog.conf | grep -Pv '^#'

realtime        = yes
priority        = 1
watchdog-device = /dev/watchdog
watchdog-timeout = 15
max-load-1 = 24
interface = wlan0

grep watch /boot/config.txt

dtparam=watchdog=on
p1r473 commented 1 year ago

I'm having this same issue (Pi 4 8gb, latest kernel, latest Watchdog), only after I restore from raspiBackup, then watchdog somehow starts messing up its detection even though Im SSHed through eth0 AND it even says it received data How can it both receive data and not receive data?

image

cat /etc/watchdog.conf | grep -Pv '^#'

admin                   = root
interval                = 10
logtick                 = 1
log-dir                 = /var/log/watchdog
watchdog-device         = /dev/watchdog
max-temperature         = 85
watchdog-timeout        = 15
temperature-sensor      = /sys/class/thermal/thermal_zone0/temp
realtime                = yes
priority                = 1
interface               = eth0
ping-count              = 10
ping                    = 192.168.1.1
verbose                 = 2     

grep watch /boot/config.txt

dtparam=watchdog=on
XECDesign commented 1 year ago

SSHed through eth0

Are you also connected to WiFi on the same subnet?

it even says it received data

It says how much it has received in total. Note that the number isn't changing. It's saying it hasn't received any new data since it last checked.

p1r473 commented 1 year ago

Are you also connected to WiFi on the same subnet?

WiFi is disconnected on my Pis. Only connected on eth0.

It says how much it has received in total. Note that the number isn't changing. It's saying it hasn't received any new data since it last checked.

/dev/proc/net is incrementing, I am tail'ing it and watch it increment AS watchdog says its not receiving anything image image

Basically Watchdog is thinking eth0 is not receiving something, but its fine, Im serving DNS and HTTP and HTTPS over it, I can ping it, I can SSH it, the interface works and byte count incrementing But Watchdog thinks its not recieving anything :( Only occurs for me when I perform an external SD card restore with the raspiBackup program https://github.com/framps/raspiBackup/issues/641

icamaster commented 1 year ago

Hi, I have the same problem. WLAN0 disabled, only using ETH0. Watchdog thinks it is not receiving anything, but ping works ok.

WatchdogProblem

Kernel version: 6.1.19-v8+ Watchdog version: 5.16-1

p1r473 commented 1 year ago

@XECDesign can we somehow reopen this issue?

icamaster commented 1 year ago

Hi, I have found more information on this. Not sure if it is a problem with the watchdog package or how the system is configured, but the problem is that there is an overflow of the "bytes" variable in the "iface.c" source file for the watchdog.

This is the line that is causing the issues: unsigned long bytes = strtoul(line + i + strlen(dev->name) + 1, NULL, 10);

Screenshot 2023-04-02 205859

p1r473 commented 1 year ago

@icamaster Ive opened a bug report with Watchdog's Debian package manager also and pasted your message https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033716

icamaster commented 1 year ago

@p1r473 Thanks. Still looking into this, but wondering if it's because our system is 64bit, but we have the 32bit version of the watchdog package or similar?

If you run 'uname -m', do you get "aarch64" ?

p1r473 commented 1 year ago

@icamaster Yep, aarch64.

image
icamaster commented 1 year ago

I think the problem is that the '/proc/net/dev' command returns 'unsigned long long' (see here: https://github.com/raspberrypi/linux/blame/rpi-6.1.y/net/core/net-procfs.c#L82), but the watchdog 'iface.c' code uses 'unsigned long', which causes the 'bytes' to always be 4294967295 (0xFFFF FFFF or ULONG_MAX). There is no check that the 'strtoul' operation has been successful. 

Recompiling the watchdog using 'unsigned long long' for 'bytes' and changing 'strtoul' to 'stroull' apparently solved the problem for me after doing a quick test. See changes below:

diff --git a/iface.c.orig b/iface.c
index 5db4e55..7b5eba6 100644
--- a/iface.c.orig
+++ b/iface.c
@@ -41,11 +41,11 @@ int check_iface(struct list *dev)

                        for (; line[i] == ' ' || line[i] == '\t'; i++) ;
                        if (strncmp(line + i, dev->name, strlen(dev->name)) == 0) {
-                               unsigned long bytes = strtoul(line + i + strlen(dev->name) + 1, NULL, 10);
+                               unsigned long long bytes = strtoull(line + i + strlen(dev->name) + 1, NULL, 10);

                                /* do verbose logging */
                                if (verbose && logtick && ticker == 1)
-                                       log_message(LOG_DEBUG, "device %s received %lu bytes", dev->name, bytes);
+                                       log_message(LOG_DEBUG, "device %s received %llu bytes", dev->name, bytes);

                                if (dev->parameter.iface.bytes == bytes) {
                                        fclose(file);
diff --git a/extern.h b/extern.h.orig
index 2eccf0b..81bc620 100644
--- a/extern.h
+++ b/extern.h.orig
@@ -30,7 +30,7 @@ struct filemode {
 };

 struct ifmode {
-       unsigned long long bytes;
+       unsigned long bytes;
 };

 struct tempmode {

Let me know what you think.

I would add the additional checks for "strtoull" to at least add a warning in the logs if this is happening in the future. Maybe even better it would be to use 'u64' instead of 'unsigned long long', as that's what is stored anyway in "struct rtnl_hw_stats64" where "rx_bytes" that we read are stored. However, they are printed as "%llu" so maybe better to keep as is?

p1r473 commented 1 year ago

Ill add this finding to the Debian bug report Can you share your recompiled Watchdog so I can test?

icamaster commented 1 year ago

I think I accidentally added the information to the wrong Debian bug report (the duplicate one).

I've attached the binary, but use it at your own risk.

What I did is: sudo systemctl stop watchdog sudo mv /sbin/watchdog /sbin/watchdog.old sudo cp /path/to/new/watchdog /sbin/watchdog sudo systemctl start watchdog

This way, you can revert back to the other watchdog after you have tested this one.

watchdog.zip

p1r473 commented 1 year ago

I accidentally created the 2 tickets because I didnt realize there was a multi hour lag between submitting and getting a confirmation. Oops, sorry! Was my first time using Debian reportbug.

Can add it to both to be safe.

icamaster commented 1 year ago

However, I can't remember why I have a 64-bit Kernel. Have you used the 'rpi-update' command to update to the latest kernel as well?

I'm confused as to why the watchdog package is compiled as a 32-bit application when I think it should have been 64-bit, and we wouldn't see this problem as both 'unsigned long' and 'unsigned long long' are 64-bit wide. Maybe my OS was 32-bit (and apt sources were and remained at 32-bit), but then my kernel got updated to 64-bit?

LE: Anything I mentioned above in this message doesn't matter, as the print of '/proc/net/dev/' is '%llu', which is 64-bit regardless of kernel/userland version, so I think the watchdog supervisor should be updated to account for this. But to answer my question, here is why my kernel is 64-bit, but userland is 32-bit: https://forums.raspberrypi.com/viewtopic.php?t=349070. (TLDR Pi4 and similar - CM4, Pi400, will switch to 64-bit Kernels when doing an update)

p1r473 commented 1 year ago

I have used rpi-update in the past. Not recently

p1r473 commented 1 year ago

I think I accidentally added the information to the wrong Debian bug report (the duplicate one).

I've attached the binary, but use it at your own risk.

What I did is: sudo systemctl stop watchdog sudo mv /sbin/watchdog /sbin/watchdog.old sudo cp /path/to/new/watchdog /sbin/watchdog sudo systemctl start watchdog

This way, you can revert back to the other watchdog after you have tested this one.

watchdog.zip

Confirmed that your recompiled Watchdog fixed my issue. Now we just need to get it patched in the repo.

I had #arm_64bit=1 commented out in my config.txt Im adding arm_64bit=0 and will hope it rolls me back.

p1r473 commented 1 year ago

Now uname -m shows armv71, so I think Ive rolled back successfully back to 32bit

p1r473 commented 1 year ago

I rolled a second Pi back to 32bit and tried Watchdog (vanilla one, without your changes), and the issue wasnt observed So you are 100% right

XECDesign commented 1 year ago

Many thanks for looking into it. I've pulled in your patch, so hopefully it should be resolved now.

icamaster commented 1 year ago
diff --git a/extern.h b/extern.h.orig
index 2eccf0b..81bc620 100644
--- a/extern.h
+++ b/extern.h.orig
@@ -30,7 +30,7 @@ struct filemode {
 };

 struct ifmode {
-       unsigned long long bytes;
+       unsigned long bytes;
 };

 struct tempmode {

Thanks for looking into this.

Sorry, but noticed that in the 'extern.h' header diff, I accidentally swapped the original with the edited file. It should have been:

diff --git a/extern.h.orig b/extern.h
index 81bc620..2eccf0b 100644
--- a/extern.h.orig
+++ b/extern.h
@@ -30,7 +30,7 @@ struct filemode {
 };

 struct ifmode {
-       unsigned long bytes;
+       unsigned long long bytes;
 };

 struct tempmode {
XECDesign commented 1 year ago

And I failed to notice extern.h was patched as well. Will push another update a little later today.

p1r473 commented 1 year ago

Many thanks for looking into it. I've pulled in your patch, so hopefully it should be resolved now.

Will this be fixed for all of Debian now or just Raspbian?

XECDesign commented 1 year ago

Only Raspberry Pi OS. I am not a Debian or Raspbian dev, so I can't do much about those. But now that there's an open bug report against the Debian version of the package, there's a chance they'll pick up the fix and it will naturally flow into Raspbian too.

p1r473 commented 1 year ago

Raspberry Pi OS

Isnt Raspberry Pi OS and Raspbian the same? The wiki page says "Raspberry Pi OS (formerly Raspbian)"

XECDesign commented 1 year ago

Raspbian is a community project which is a recompile of Debian for armv6hf. Raspberry Pi OS is based on either Raspbian or Debian with additional changes by Raspberry Pi engineers.

icamaster commented 1 year ago

I have found that the project is maintained at Sourceforge, and I wanted to raise a pull request for the change so that it is added to Debian as well, but it looks like the problem has been already fixed since a few weeks ago: https://sourceforge.net/p/watchdog/code/ci/5ad1322bca008be85313bca08fb7a87c0a85997e/

The person who did this probably ran into the same problem.

p1r473 commented 1 year ago

I just did an apt upgrade and upgraded Watchdog @icamaster are you still on x64? Can you see if this is fixed on x64 as Im back on 32bit now

icamaster commented 1 year ago

I'm still on aarch64 and everything is good for me after updating watchdog to 5.16-2-~rpt2.

p1r473 commented 1 year ago

Yay, good work all!

ZOMGVTEK commented 7 months ago

I've been having this issue on Pi's for some time now.

"device wlan0 did not receive anything since last check" wlan0 is showing packets, a program is running that's pulling data off the internets.

5.16-2~rpt2, Pi Zero2W and Pi3B+. Linux version 6.1.21-v8+, 64 bit.

Anyone have suggestions?

ZOMGVTEK commented 7 months ago

Downgrading to 5.16-1 solved it for me.

hobbieman commented 3 months ago

This problem is still present in the 32-bit version of bookworm on a Raspberry Pi W v1.1.

I am connected to the raspberry pi on ssh via the WiFi, using the shell and watchdog says it has not received any traffic on wlan0 :-)

I am using the latest Raspberry Pi OS Lite image (downloaded it and installed it 2 days ago).

BTW, I also tried using icamaster's watchdog executable, but that made no difference.

$ journalctl -f | grep wlan0 May 16 20:30:47 pi watchdog[585]: device wlan0 did not receive anything since last check May 16 20:30:48 pi watchdog[585]: device wlan0 did not receive anything since last check May 16 20:30:49 pi watchdog[585]: device wlan0 did not receive anything since last check May 16 20:30:50 pi watchdog[585]: device wlan0 did not receive anything since last check

$ uname -a Linux pi 6.6.20+rpt-rpi-v6 #1 Raspbian 1:6.6.20-1+rpt1 (2024-03-07) armv6l GNU/Linux

$ grep watchdog /boot/firmware/config.txt dtparam=watchdog=on

$ grep -Pv '^#' /etc/watchdog.conf | grep -v "^$" realtime = yes priority = 1 watchdog-device = /dev/watchdog watchdog-timeout = 15 max-load-1 = 24 interface = wlan0

$ cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 12 (bookworm)" NAME="Raspbian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

$ dpkg -l | grep watchdog ii watchdog 5.16-2~rpt2 armhf system health checker and software/hardware watchdog handler

$ cat /sys/firmware/devicetree/base/model && echo Raspberry Pi Zero W Rev 1.1

XECDesign commented 3 months ago

The current package still has the patch that fixed the issue before, so I don't think it's the exact same issue. However, your report is detailed enough, so I'll try reproducing it.

Is the pi also connected to ethernet, or is it 100% wifi-only?

hobbieman commented 3 months ago

It is 100% wifi-only. Let me know if you'd like me to load patches/do tests.

Green-on-Black commented 3 months ago

I'm experiencing the same issue with eth0

journalctl -f | grep eth0:

May 22 15:31:28 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:31:33 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:31:42 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:31:57 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:32:12 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:32:17 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:32:27 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:32:28 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check
May 22 15:32:38 raspberrypi watchdog[1327]: device eth0 did not receive anything since last check

uname -a:

Linux raspberrypi 6.6.20+rpt-rpi-v6 #1 Raspbian 1:6.6.20-1+rpt1 (2024-03-07) armv6l GNU/Linux

grep -Pv '^#' /etc/watchdog.conf | grep -v "^$":

watchdog-device     = /dev/watchdog
realtime        = yes
priority        = 1
interface       = eth0
max-load-1      = 24

cat /etc/os-release:

PRETTY_NAME="Raspbian GNU/Linux 12 (bookworm)"
NAME="Raspbian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

dpkg -l | grep watchdog:

ii  watchdog                             5.16-2~rpt2                      armhf        system health checker and software/hardware watchdog handler

cat /sys/firmware/devicetree/base/model && echo:

Raspberry Pi Model B Rev 2

It's ancient. Don't judge me 😁

icamaster commented 3 months ago

For those having problems, what do you get if you try cat /proc/net/dev | grep eth0 and cat /usr/include/stdlib.h | grep strtoul ?

hobbieman commented 3 months ago

# cat /proc/net/dev | grep wlan0 wlan0: 15923231 82006 0 71 0 0 0 76822 255579 2787 0 0 0 0 0 0

# cat /usr/include/stdlib.h | grep strtoul extern unsigned long int strtoul (const char restrict nptr, extern unsigned long long int strtoull (const char restrict nptr, extern unsigned long int strtoul_l (const char restrict nptr, extern unsigned long long int strtoull_l (const char restrict nptr,

hobbieman commented 3 weeks ago

Can this issue be reopened to address the problems on the older raspberry pi's? or should I spin up a new issue for that?

XECDesign commented 3 weeks ago

This particular issue has been reported as fixed and as far as I know the necessary patch is still present. The issue you're having seems to be at least a different flavour of it, so I think it needs to be a separate issue, maybe referencing this one.