RPi-Distro / repo

Issue tracking for the archive.raspberrypi.org repo
37 stars 1 forks source link

Apt upgrade kills wifi on RPi 3 #345

Open andrimarjonsson opened 1 year ago

andrimarjonsson commented 1 year ago

Took two of my old RPi3s out of storage for a project. Download the rpi-imager, select latest 32bit lite raspbian with wifi and user config. First boot is fine, start with the standard apt update+upgrade combo. Wifi dies. All I get is a 169.x.x.x address.

At first I did not realize this could be it on the first one, thinking I may have just been unlucky and the hardware died. But starting the second one I did start by doing some network tests and all was fine, so I started setting up the software. apt update && apt upgrade and boom. Same symptoms as the first one.

Here is the apt upgrade log: term.log

As you can see, 33 packages, many of them firmware-* updates. I already did try and downgrade the brcm80211 firmware to the prev version in the repo, which does not match what it was updgrading from or I'm reading the apt log incorrectly, but that did not help.

Thinking all those firmwares hang together I tried calling rpi-update with a hash pointing to the date(20221012) in the log but I got a 404 on https://builds.raspberrypi.com

Wiping the card with a clean image using rpi-imager also does not work, as it seems the new firmware is causing some shenanigans.

One thing that is interesting is that all the firmware-* updates in the log denote they are updating from 20221012, a date much later than these two Pi's where last in use.

But yeah, I now essentially have two useless bricks, please advice, and/or fire away if you have questions or need clarifications.

XECDesign commented 1 year ago

@pelwell can you think of any way any sort of official update could damage a pi 3 like that?

The 169 address occurs when it can't get an IP address from the dhcp server. The firmware is not stored anywhere on the device, it gets loaded from the SD card. So, if you're flashing a clean image, which has previously worked for you, and it is now not working, it seems to a problem with the state of the local network.

If you're familiar with tools like wireshark, I would check what's happening on the access point side - is it seeing the dhcp requests? Is it replying to them? Then on the pi side, does it think it has sent the dhcp request? Is it getting a response back? If so, is it giving any error messages?

pelwell commented 1 year ago

Not that much should have happened since the last 32-bit Lite image (May 3rd 2023).

  1. Do you have a country=XX line in your /etc/wpa_supplicant/wpa_supplicant.conf?
  2. What does dmesg | brcmfmac report?
  3. What does dmesg -l warn,err report?
pelwell commented 1 year ago

FYI, I just took the latest 32-bit Lite image and upgraded it, and I still have WiFi:

pi@raspberrypi:~$ uname -a
Linux raspberrypi 6.1.21-v7+ #1642 SMP Mon Apr  3 17:20:52 BST 2023 armv7l GNU/Linux
pi@raspberrypi:~$ tail -1 /proc/cpuinfo
Model           : Raspberry Pi 3 Model B Rev 1.2
pi@raspberrypi:~$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)"
NAME="Raspbian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@raspberrypi:~$ sudo apt update
Hit:1 http://archive.raspberrypi.org/debian bullseye InRelease
Hit:2 http://raspbian.raspberrypi.org/raspbian bullseye InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
pi@raspberrypi:~$ sudo apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
pi@raspberrypi:~$ ifconfig wlan0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.3.15.233  netmask 255.255.255.0  broadcast 10.3.15.255
...
andrimarjonsson commented 1 year ago

Thank you guys for quick responses

The 169 address occurs when it can't get an IP address from the dhcp server. The firmware is not stored anywhere on the device, it gets loaded from the SD card. So, if you're flashing a clean image, which has previously worked for you, and it is now not working, it seems to a problem with the state of the local network.

Yes I'm familiar with 169 addresses and I vaguely remembered something along the lines of the firmware being loaded from the sd card each time. Which is why my first attempt was to simply re-flash and try again. Checked the local network, restarted both my APs(Unifi LR) and router(pfSense), manually cleaned up dhcp licenses but to no avail. Also worth mentioning, those two PIs are quite literally the only two devices that have issues connecting to my home network, which includes devices from all major brands, old and new. Not so much as a hiccup on any of them.

If you're familiar with tools like wireshark, I would check what's happening on the access point side - is it seeing the dhcp requests? Is it replying to them? Then on the pi side, does it think it has sent the dhcp request? Is it getting a response back? If so, is it giving any error messages?

Not gone so far as bringing out Wireshark yet, but I did scour the journal and seeing dhcp offers coming in for wlan0 on the same IP as it initially connected on before the upgrade. Few seconds later, I see entries along the lines of "setting up routes for 169.... on wlan0"

Do you have a country=XX line in your /etc/wpa_supplicant/wpa_supplicant.conf?

Yes

What does dmesg | brcmfmac report?

[   10.442857] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[   10.459351] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[   10.464530] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.raspberrypi,3-model-b.bin failed with error -2
[   10.493613] usbcore: registered new interface driver brcmfmac
[   10.769259] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Feb 16 2020 22:39:24 version 7.45.98.97 (r724416 CY) FWID 01-bf41ed64
[   12.467080] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled

Something I did not notice yesterday is the following brcmfmac: mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.raspberrypi,3-model-b.bin failed with error -2 I'll be examining this after posting.

What does dmesg -l warn,err report?

[    0.131354] usb_phy_generic phy: supply vcc not found, using dummy regulator
[    0.131552] usb_phy_generic phy: dummy supplies not allowed for exclusive requests
[    2.938806] Core Release: 2.80a
[    2.938902] Setting default values for core params
[    2.939039] Finished setting default values for core params
[    3.139425] Using Buffer DMA mode
[    3.139515] Periodic Transfer Interrupt Enhancement - disabled
[    3.139656] Multiprocessor Interrupt Enhancement - disabled
[    3.139789] OTG VER PARAM: 0, OTG VER FLAG: 0
[    3.139953] Dedicated Tx FIFOs mode

[    3.140697] WARN::dwc_otg_hcd_init:1074: FIQ DMA bounce buffers: virt = 9ed04000 dma = 0xded04000 len=9024
[    3.140992] FIQ FSM acceleration enabled for :
               Non-periodic Split Transactions
               Periodic Split Transactions
               High-Speed Isochronous Endpoints
               Interrupt/Control Split Transaction hack enabled

[    3.141590] WARN::hcd_init_fiq:457: FIQ on core 1

[    3.141762] WARN::hcd_init_fiq:458: FIQ ASM at 807c4360 length 36

[    3.141959] WARN::hcd_init_fiq:497: MPHI regs_base at bb810000
[    3.161482] Init: Port Power? op_state=1
[    3.167790] Init: Power Port (0)
[    3.358978] Indeed it is in host mode hprt0 = 00021501
[    3.553892] mmc0: host does not support reading read-only switch, assuming write-enable
[    3.579800] Indeed it is in host mode hprt0 = 00001101
[    7.717122] systemd-journald[143]: File /var/log/journal/6a59ee8c5d674d2caeefacddd5d32594/system.journal corrupted or uncleanly shut down, renaming and replacing.
[    9.022340] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
[    9.155129] snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned.
[    9.431260] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
[    9.439894] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
[    9.441408] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
[    9.489131] bcm2835_isp: module is from the staging directory, the quality is unknown, you have been warned.
[    9.498936] bcm2835_v4l2: module is from the staging directory, the quality is unknown, you have been warned.
[    9.586116] bcm2835_codec: module is from the staging directory, the quality is unknown, you have been warned.
[   10.464530] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.raspberrypi,3-model-b.bin failed with error -2

@pelwell As for the system setup, I basically have the same thing you posted.

I'll go ahead and look at that failed firmware load message and do some maintenance on my internal network just to be sure.

Again, thank you for the quick replies, I'll report back if and when I find something of note.

pelwell commented 1 year ago

The failed firmware load is an annoying red herring - what you don't get is a message saying that it has successfully loaded the firmware brcm/brcmfmac43430-sdio.bin.

The other messages are also harmless and expected.

andrimarjonsson commented 1 year ago

The failed firmware load is an annoying red herring - what you don't get is a message saying that it has successfully loaded the firmware brcm/brcmfmac43430-sdio.bin.

Yeah, found that out pretty quickly through basic googling.

Used the opportunity to do some long overdue maintenance on my router, switches and APs, on the very off chance that some combination of AP firmware + RPi was giving me issues. As I kind of expected, that was not the case, but now at least I have freshly cleaned and updated internal network.

I have a few things to try out after talking to the in-house IT at work and as before, post here if I find something, questions arise or I hit a dead end.

andrimarjonsson commented 12 months ago

Finally this seems solved.

Writing this down here in the hopes that the google gods will be kind and lead people here with similar issue.

Two days ago I got some more time to debug and decided to move a few of my 5GHz devices to my 2.4Ghz SSID. They immediately showed signs of a flaky connection, not as bad as the RPI, but bad nonetheless.

My home wifi is basically split up as follows

There are also 5Ghz varieties of those so 6 SSIDs in total. Going over the config in the Unifi controller I found that the 2.4_VPN was set to use both 2.4 and 5 Ghz with band steering enabled. As soon as I turned this setting off, as it should have been all along, all connections on my 2.4 network have been rock solid for two whole days now.

I have no idea when this happened on my Unifi setup as I hadn't really run any 2.4GHz devices for months until I took my RPI 3s out of storage for a project. Most devices would take long to get an ip, work for 5 minutes(enough time to do apt update && apt upgrade) or so before becoming unusable.

Again, thanks everyone for the help! If anything in this explanation is unclear, just ask and I'll try and clarify a bit better.

pelwell commented 12 months ago

Thanks for reporting back, and well done for finding and solving the problem.

TL;DR = band steering is bad

andrimarjonsson commented 12 months ago

Spoke too soon, forgot to reboot the PI after the upgrade.

The band steering issue was an issue, just an unrelated one, it just affected older devices with dual(2.4 and 5 GHz) capabilities.

Now I have a bunch of devices on the 2.4 network running flawlessly, all except my two RPIs.

Sorry about this, just what you get with 3 young kids and limited late night debugging time :)

Also, did manage to "reset" the PI's so they work again on the stock RPI image flashed by rpi-imager. Not sure how exactly but I suspect it had to do with the break I took with this, effectively allowing all dhcp leases and such expire by doing so. Should be easy enough to test. I'll see if I get to it it the next couple of days.

Otherwise, back to the drawing board, and again, sorry for the inconvenience.

andrimarjonsson commented 12 months ago

Managed to get some tests in yesterday evening with a friend. He noted when we ran iwconfig that the bitrate was very low(sub 11Mb/s) for a full link quality(70/70). Some googling about rpi3 and bitrate led me to this post https://forums.raspberrypi.com/viewtopic.php?t=162044 there it states this is an old problem and the driver just reports it being on still but that is pending a fix. This later post seems to reinforce that https://raspberrypi.stackexchange.com/questions/96606/make-iw-wlan0-set-power-save-off-permanent Even there it seems the false reporting that was pending a fix is now fixed.

Yet when I run journalctl | grep brcmfmac I get

Oct 09 22:38:04 raspberrypi kernel: brcmfmac: F1 signature read @0x18000000=0x1541a9a6
Oct 09 22:38:04 raspberrypi kernel: brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
Oct 09 22:38:04 raspberrypi kernel: usbcore: registered new interface driver brcmfmac
Oct 09 22:38:04 raspberrypi kernel: brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Jul 19 2021 03:24:18 version 7.45.98 (TOB) (56df937 CY) FWID 01-8e14b897
Oct 09 22:38:06 raspberrypi kernel: brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled

running sudo iw wlan0 set power_save off and my wifi springs back to life with 72.2Mb/s bitrate.

Reboot, power save is back on and bitrates are down to sub 11Mb/s

Now sure, I could patch it up by doing what the second linked page suggests via systemd service or, like many other places suggest, just put the iw command in rc.local and be done with it. But this is something that was seemingly fixed +5 years ago, yet I'm currently sitting in front of a system that still reports this being enabled in the journal at boot time and it seems it is actively causing problems.

@pelwell What do you get when running journalctl | grep brcmfmac on your RPI3?

Note: this is on my RPI that I upgraded and then "downgraded" by flashing a new SD card with the Raspbian Lite image from May 2023. I have done nothing to this system after flashing.

pelwell commented 11 months ago

dmesg | grep brcmfmac (I only care about the kernel log from this boot) gives:

[   10.849418] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[   10.854820] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[   10.855126] usbcore: registered new interface driver brcmfmac
[   11.108885] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Jul 19 2021 03:24:18 version 7.45.98 (TOB) (56df937 CY) FWID 01-8e14b897
[   13.009936] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled

Disabling power save doesn't affect WLAN speed on my Pi 3, but I'm only getting around 38Mb/s down and 20Mb/s up where I am.

andrimarjonsson commented 11 months ago

Interesting. For me pi is in the same room as the AP so it should not really affect anything. But observed behavior is:

Did check for low power warnings using vcgencmd, none have been reported.

I'll see if I can find time tonight to do update & upgrade and see if manually turning off power save has the same effect.

andrimarjonsson commented 11 months ago

This is the gift that keeps on giving, wanted to do the update test tonight, pi is not reachable and the router does not have a dhcp entry for it anymore. Connect it to screen and keyboard and run ip addr, link is shown as dormant. Power save is still off.

I'm really at a loss with this one at this point, this is the only device on my entire network that seems to be completely unable to maintain a stable wifi connection for more than 12 hours.