Closed BlueWings172 closed 2 years ago
Let's work backwards. I'm going to assume a Bullseye 64-bit installation using PiBuilder. Please let me know if your setup is different.
The first thing to check is:
$ tail -5 /etc/dhcpcd.conf
# patch needed for IOTstack - stops RPi freezing during boot.
# see https://github.com/SensorsIot/IOTstack/issues/219
# see https://github.com/SensorsIot/IOTstack/issues/253
allowinterfaces eth*,wlan*
What that does is tells the DHCP client running in the Pi to only consider Ethernet and WiFi interfaces as candidates for dynamic address allocation. Without that in place, all the veth
interfaces that Docker sets up when you bring up your stack also try to participate in something Docker is already doing. That can have knock-on effects at boot time (system stalls) or during network transients.
The second thing to check is:
$ grep "isc-dhcp-fix.sh" /etc/rc.local
/usr/bin/isc-dhcp-fix.sh eth0 wlan0 &
Notice the "eth0" and "wlan0" arguments. That reflects the fact that my Pi has both of those interfaces active so both of them are being "kept alive" by the isc-dhcp-fix.sh
script. It's important to realise that this line is setup by PiBuilder at install time. It isn't dynamic so if, for example, Ethernet wasn't "there" at PiBuilder time but WiFi was then that line will only have "wlan0" in it.
In other words, you should make sure that the interfaces provided as arguments on that line reflect what you actually need in practice.
The third thing to check is whether isc-dhcp-fix.sh
is firing. The easiest way to do that is:
$ grep -a "isc-dhcp-fix" /var/log/syslog
Remember that syslog rotates. For Buster and earlier that was every 24 hours. For Bullseye and later, it's every week (but there's a PiBuilder tutorial in how to put it back to 24 hours). The point is getting silence from that grep can mean either:
isc-dhcp-fix.sh
is present and firing but no interfaces have bounced since the last log rotation; orisc-dhcp-fix.sh
is either not installed, or is installed but not firing.To make sure it is there:
pi@sec-dev:~$ cat /usr/bin/isc-dhcp-fix.sh
#!/bin/bash
logger "isc-dhcp-fix launched"
while [ $# -gt 0 ] ; do
for CARD in $@ ; do
ifconfig "$CARD" | grep -Po '(?<=inet )[\d.]+' &> /dev/null
if [ $? != 0 ]; then
logger "isc-dhcp-fix resetting $CARD"
ifconfig "$CARD" up
sleep 5
fi
sleep 1
done
sleep 1
done
To make sure it can be run:
$ sudo /usr/bin/isc-dhcp-fix.sh eth0 wlan0
Unless you get an error (eg no execute permission) you'll probably get silence. Wait about 10 seconds and then hit control+C, then run the grep
again. You should at least get the "launched" message.
After any reboot, I generally get this triple:
Jun 11 14:25:58 sec-dev root: isc-dhcp-fix launched
Jun 11 14:25:58 sec-dev root: isc-dhcp-fix resetting eth0
Jun 11 14:26:04 sec-dev root: isc-dhcp-fix resetting wlan0
Now I'll make a different assumption which is the interface arguments in /etc/rc.local
are correct, that isc-dhcp-fix.sh
is firing properly, and a grep
of the log produces a ton of "resetting" messages.
If that's the case then I'd start to divide and conquer. If I had Ethernet but WiFi was being problematic, I'd probably disable WiFi for a while and see if Ethernet was more robust.
If Ethernet was flaky then I'd change my Ethernet cable to see if it influenced the behaviour. Then I'd try a different switch port or perhaps a totally different switch.
If I came to suspect the Raspberry Pi Ethernet port itself, I'd invert the problem by enabling WiFi and disabling the Ethernet port. If the problem persisted I'd then start to worry about what else might be going on inside the Pi.
For the record, I've got four Pi4s (one Buster, three Bullseye), all 4GB RAM running on SSDs, all dual Ethernet and WiFi, all built with PiBuilder. I do see occasional "resetting wlan0" messages (maybe one a week across all four machines) but, except at boot time or when there's an obvious explanation (like pulling out an Ethernet cable) it's very very rare for me to see a "resetting eth0" message.
To make the claim in the previous paragraph more "evidence" than "anecdote", I just grabbed all the logs from all four machines. That's the last 7 days plus the current day on each machine: 32 log files in total. I found 24 hits, all from my test Pi, all were the "triples" characteristic of a reboot (example above). The test machine is something I reboot all the time so that makes sense. The other three I rarely reboot so the only reasonable inference is isc-dhcp-fix.sh
isn't firing because Ethernet and WiFi aren't bouncing.
While n=4 doesn't really prove all that much, it does at least demonstrate that the combination of Pi 4 + PiBuilder doesn't always result in network interface problems.
The last point I'll make is that, if you're seeing "resetting" messages when you run grep
, those are just evidence of the script sensing that an interface has gone down. They are written to the log with logger
so that you can open the log and search for them, then look to see what else is writing messages into the log that might guide you to the underlying problem. You also get a timestamp so you can cross-correlate with other files in the log directory, or logs kept by other devices on your network.
Hope something in all of the above helps you track down and nail this problem.
@Paraphraser
Your assumption is correct; I'm running a Bullseye 64-bit installation using PiBuilder.
1- the output of tail -5 /etc/dhcpcd.conf
is identical to yours.
pi@raspberrypi:~ $ tail -5 /etc/dhcpcd.conf
# patch needed for IOTstack - stops RPi freezing during boot.
# see https://github.com/SensorsIot/IOTstack/issues/219
# see https://github.com/SensorsIot/IOTstack/issues/253
allowinterfaces eth*,wlan*
2- the output of grep "isc-dhcp-fix.sh" /etc/rc.local
is identical to yours.
pi@raspberrypi:~ $ grep "isc-dhcp-fix.sh" /etc/rc.local
/usr/bin/isc-dhcp-fix.sh eth0 wlan0 &
3- The line grep -a "isc-dhcp-fix" /var/log/syslog just flooded my screen with events. I actually had to output to a txt file to be able to view it. For some days there is an event exactly every 8 seconds (10,742 per day). For others days where the Pi spent much of the time frozen, there are much less events. Exactly 99.87% of these events are for eth0 and the rest for wlan0 (maybe because in the begging of the week, i was using wifi only?). That said, when the Pi is inaccessible, they both do not respond to ping, SSH, or browser access. Here are 2 samples:
Jun 5 00:00:06 raspberrypi root: isc-dhcp-fix resetting eth0
Jun 5 00:00:14 raspberrypi root: isc-dhcp-fix resetting eth0
Jun 5 00:00:22 raspberrypi root: isc-dhcp-fix resetting eth0
Jun 9 10:38:25 raspberrypi root: isc-dhcp-fix launched
Jun 9 10:38:25 raspberrypi root: isc-dhcp-fix resetting eth0
Jun 9 10:38:31 raspberrypi root: isc-dhcp-fix resetting wlan0
Jun 9 11:37:49 raspberrypi root: isc-dhcp-fix launched
Jun 9 11:37:49 raspberrypi root: isc-dhcp-fix resetting eth0
Note: out of the 40k lines, only 0.07% said 'launched
' and the rest were 'resetting
'.
4- Obviously isc-dhcp-fix.sh
is installed and works and the line cat /usr/bin/isc-dhcp-fix.sh outputs the same content you included.
5- The line sudo /usr/bin/isc-dhcp-fix.sh eth0 wlan0
did not output anything. Running the grep command again, did indeed output the launched
event.
6- After reboot and running the grep command, I did get the below lines as expected:
Jun 11 17:39:14 raspberrypi root: isc-dhcp-fix launched
Jun 11 17:39:15 raspberrypi root: isc-dhcp-fix resetting eth0
Jun 11 17:39:21 raspberrypi root: isc-dhcp-fix resetting wlan0
Your assumptions is correct, regarding rc.local, isc-dhcp-fix.sh, and grep. When I started experiencing issues with WIFI, I connected the ethernet cable. That did not help at all. They get both disconnected at the same time. Ethernet cable is for sure working and high quality as I'm using it for another machine.
I would buy another Pi if they were available. You have more chance to win the lottery that to find a reasonable priced Pi now.
As I mentioned, I had previously had issues with IOTstack and Wifi. I was hoping that the combination of ethernet, Bullseye and PiBuilder will eliminate the problem. But this turned to be a huge waste of time on unreliable toys.
I really appreciate Windows now.
Thanks for all the effort and time you spent helping.
I've been chasing a problem with a Raspberry Pi 3B+ for some time. I had a collection of notes which I've just turned into this gist. It might turn out to be relevant to your situation.
Thanks for all the valuable info. My power supply is rated at 3A. I have tried several phone chargers, USB adapters and power strips with USB outlets. 9 in total different options to power the Pi, that should be providing 2.4A or more but my Pi experienced these network freezes (both Ethernet and WIFI) with all 9 of them. I have also been monitoring the PI power consumption using one of these.
The highest draw I've seen is 1.5A but I noticed that with some phone adapters, the Pi doesn't get more than 0.9A. That said, my tests were not long enough to be conclusive.
I have ordered a new 3A power supply which should be higher quality than the one I have. It should arrive tomorrow so I will test and report back.
Thanks again brother.
I'm a bit worried that you might have missed the point of the gist and, accordingly, be disappointed by your new power supply.
At the risk of telling you things you already know - aka "teaching my grandmother to suck eggs" - and apologising in advance if that's what I'm doing...
Think of it like this:
I'm not sure that 5…7 are correct but you get the general idea.
I've taken two measurements. The first is by disconnecting the power cable where it plugs into 4 and connecting it to a controlled load. This is the "can the wall wart deliver to specification?" test. All supplies I have pass this test.
The second measurement is by inserting an inline monitor between 3 and 4, in the same way you have. This is the "what is the Pi actually drawing from the wall wart as it operates?" test. The only real difference between our approaches is mine also logs its observations over Bluetooth so I can capture the data and graph it.
The on-board regulator puts out 1.8V, 3.3 and 5V. The 3.3 and 5V appear at header pins and I've watched those using an oscilloscope. Nothing particularly revealing. Both voltages are quite stable even when the Pi is under heavy compute load.
I haven't tried attaching sensors or other forms of load across the header pins
That leaves the 1.8V power rail and I assume (don't know for a certain fact) that the ARM "compute guts" run on 1.8. I'm basing this guess on the numbers in voltage reports from vcgencmd
always being less than 1.8.
Every time I've been lucky enough to catch a currently under-voltage
from vcgencmd_power_report
the "core" measurement on the Pi 3B+ has been around 1.2V. In normal operation, it seems to be 1.3V on the Pi 3B+.
The Pi 4Bs all seem to have a much lower "core" around 0.85V.
It's that drop to 1.2V on the 3B+ which seems to be the trigger and it suggests (at least to me) that either the on-board regulator is flaky and incapable of sustaining the necessary Watts at 1.8V when the system is under load, or the measurement circuitry is flaky and giving false readings which are triggering currently under-voltage
.
All other things being equal, if the measurement circuitry was flaky (so these were false positives) I would not expect that to have a consequential impact on things like network interface ports. The fact that I do see wonky network interface behaviour makes me think it's more likely that the on-board regulator is the culprit. None of this is proof, of course. It may be that some part of the IP stack also watches the currently under-voltage
or currently throttled
conditions and drops the interfaces to conserve power.
The material point, however, is that it really wouldn't matter if I handed the Pi 3B+ a power supply capable of sustaining 100 Amps at 5.1V. The input side of the on-board regulator still wouldn't draw more than 1 amp while the 1.8V output side still would be incapable of delivering what the Pi 3B+ ARM SoC needed to do its work.
In short, if your Pi 4B has a similar "wonky" on-board regulator then a new PSU might not achieve much.
My question to you is, have you run vcgencmd_power_report
to see whether the Pi is moaning about power problems?
If it is then my guess is you'll eventually have to reach the same conclusion about the Pi 4B that I've reached about my 3B+ : useless for anything practical, need a new one, shame about the chip shortage.
If it isn't then we could be barking up the wrong tree entirely. You said you rebuilt with PiBuilder. I've got a collection of 4Bs all built with PiBuilder which are rock solid, interface-wise, plus a lack of issues reporting problems similar to yours. I have no idea how many people use PiBuilder. The only indication of any wider interest is 11 forks and 26 stars and that might not be enough to generalise from, reliably.
If we assume not power and not PiBuilder then the next cab off the rank would be the network interfaces themselves. I'd be considering a USB dongle or hub that included an Ethernet interface. Disabling the built-in interfaces in favour of a dongle/hub interface might let you eliminate the built-in interfaces.
A couple of other thoughts. I spent a good portion of my career wearing a "comms guru" hat. My knowledge is a bit out-of-date but I learned enough to trust absolutely nothing when it comes to comms. I don't trust Ethernet cables and I don't trust switches or switch ports. If at all possible, vary everything you can to see if the problem follows the cable/port/switch.
I'd also be looking for any unexplained oddities like, I have a gigabit switch but the Pi is only running at 100BaseT - why? 'sudo ethtool eth0`
Another thing to do is to make sure you aren't a victim of a duplicate MAC address. These are rare but far from unknown and they play merry hell with everything.
I view the modern fad of random WiFi MACs with deep misgivings. I turn that off as soon as I see it. Seems to me the designers of this nonsense should at least have considered that "home" and "work" networks are places where you don't need this to be enabled. The only time it makes sense is when you're out of range of your normal networks. Still, that's just me. I'm no longer paid the Big Bucks to foist this nonsense on everyone else.
Something like this should get the job done:
On another device, run:
$ sudo tcpdump ether host «MAC»
where «MAC» is the MAC address of the Pi's Ethernet interface. Or use WireShark instead of tcpdump.
If something else has the same MAC, it will eventually broadcast and you'll see it.
Hi Again,
Now that I had sometime to conduct some tests, here is the gist:
1- New power supply is similar to the one I have (model DSM-0530) and it is absolute garbage. It seems to be a common model and this is a warning for anyone who thinks about buying it. You better off running your Pi on hamster power.
2- After I figured out how to export power usage data to excel from my UM25C (doesn't work on Windows or Android but worked on my iPad), I ran tests consisting of a reboot followed by 15 minutes of stress test then continued collecting data for several hours. The 3 scripts you provided were instrumental to immediately see when the Pi starts struggling. I also collected CPU temp, to test theory that the more capable power supply will allow the CPU to reach the highest temps during the stress test.
3- I choose 3 of the USB power sources that I have which I felt are more capable than the others. I ran the test several time over several days and results were consistent.
I used the below lines to install and run the stress test:
sudo apt-get install stress
while true; do vcgencmd measure_clock arm; vcgencmd measure_temp; sleep 10; done& stress -c 4 -t 900s
4- Results:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
PSU | DSM-0530 | Samsung USB C Model EP-TA800 | Power Strip -- | -- | -- | -- Notes | Unofficial Raspberry Pi Charger | Phone Charger | Power Strip with USB ports AMP Sepcs | 3A | 3A | 3A Charging Mode as Shown in UM25C | "Unknown" | "Unknown" | Apple 2.1 Undervoltage Warning in Desktop | Yes | Yes | No Reporting Undervoltage Now | Yes | Yes | No during Test but occasionally and briefly during normal Operations Highest AMP | 0.94 | 1.28 | 1.24 Highest Temp Under Stress | 48.7 | 51.1 | 62.3 Max Voltage | 4.95 | 4.93 | 5.164 Max W | 4.518 | 6.27 | 6.43
Hello
My Raspberry Pi 4b randomly becomes inaccessible in LAN.
I have posted this issue months ago in the IOTstack page and and it I was suggested to use Pi Builder.
So I finally found the time to back my data up and give Pi Builder a try. The few first days were eventless but unfortunately then the Pi started disappearing from the network and becomes inaccessible several times a day which renders my setup completely useless. This happens regardless if the Pi is connect via WIFI or Ethernet. This is a real bummer because I've spent so much time trying to learn and implement new things. I'm using a brand new SD card.
I would appreciate any help.
Thanks