freifunk-gluon / gluon

a modular framework for creating OpenWrt-based firmwares for wireless mesh nodes
https://gluon.readthedocs.io
Other
550 stars 325 forks source link

mt7621 - Netgear: correct the PCIe port number for upcoming Kernel 5.15 #2789

Closed Djfe closed 1 year ago

Djfe commented 1 year ago

Bug report

What is the problem? Original commit, before he/she removed the pci changes. It was committed without changes to Netgear devices that are likely required. https://github.com/openwrt/openwrt/commit/29447ea2c58a423031ec4ed827bffcd574596ade

MT7621 uses a new PCIe driver in the 5.15+ kernel. Allocating wrong PCIe port will cause the PCIe NIC to not work properly. This commit fixes these wrong ports.

According to DragonBluep Netgear R6220, WAC104 and WNDR3700 v5 are likely affected He/she couldn't find anyone to test these devices in Kernel 5.15 while it was still in testing. https://github.com/openwrt/openwrt/pull/11220#issuecomment-1435435140

Ramips was just moved to 5.15 in OpenWRT master for the upcoming Full Release at the end of march. All three devices are supported by Gluon, we should try finding people to test OpenWRT master on these three devices to prevent 2.4GHz from breaking on Gluon23. We don't have any of these devices running at Freifunk Aachen.

Djfe commented 1 year ago

@blocktrron I'm mentioning you, since you added the R6220. Are you available for testing the device at some point? :) @mbaumga Are you available for testing the WAC104? :) The WNDR3700 v5 was added as untested (and still is https://github.com/freifunk-gluon/gluon/blob/master/targets/ramips-mt7621#L44)

mbaumga commented 1 year ago

@Djfe: So, just to verify what to test: I just have to put on a current OpenWrt master build and see if 2.4GHz Wifi appears? @blocktrron: I can take over testing the R6220, so you don't have to. :)

Djfe commented 1 year ago

I think so, yes. and post here the output of logread :) https://firmware-selector.openwrt.org/?version=SNAPSHOT&target=ramips%2Fmt7621&id=netgear_wac104

As far as I understand it, it should still boot.

You can adjust installed packages beforehand in the firmware selector and request a sysupgrade file; or install them via opkg manually afterwards. As you might want to config and activate wifi via luci and that's not pre-installed on snapshots. https://openwrt.org/docs/guide-user/luci/luci.essentials

Djfe commented 1 year ago

the bootlog will contain these kinds of messages if the pcie numbers are different now and need to be adjusted

[4.197658] mt7621-pci 1e140000.pcie: pcie0 no card, disable it (RST & CLK)
[4.204609] mt7621-pci 1e140000.pcie: PCIE1 enabled
[4.209476] mt7621-pci 1e140000.pcie: PCIE2 enabled
...
[4.307988] pci 0000:01:00.0: [14c3:7662] type 00 class 0x028000
[4.367206] pci 0000:02:00.0: [14c3:7603] type 00 class 0x028000

Please take logread in the minute after booting and then test wifi 2.4 and 5ghz (both could be broken actually, maybe they don't even show up in luci ^^)

mbaumga commented 1 year ago

I prefer using the Imagebuilder and generate images for all my devices with a bash script. Using UI tools would be far too much hassle to do so for 20+ different devices. :)

Tested with OpenWrt SNAPSHOT, r22104-01262c921c from Feb 18 20:11:36 2023

Output of WAC104: https://paste.tecff.de/?eb1a9492f3b47592#Sln5geBBmdHCZyEFUdM4n4lbuKjJ3xrdEmw0yGT6OIU=

Sat Feb 18 20:22:00 2023 kern.err kernel: [ 1.066669] mt7621-pci 1e140000.pcie: pcie1 no card, disable it (RST & CLK) Sat Feb 18 20:22:00 2023 kern.info kernel: [ 1.080425] mt7621-pci 1e140000.pcie: PCIE0 enabled Sat Feb 18 20:22:00 2023 kern.info kernel: [ 1.090085] mt7621-pci 1e140000.pcie: PCIE2 enabled

Output of R6220: https://paste.tecff.de/?78517da5aae54a65#UX9CvXrHAiK8GOUPSg6KsHuPEZIa0ZHFWGR2ZB1v+gQ=

Sat Feb 18 20:12:16 2023 kern.err kernel: [ 1.067562] mt7621-pci 1e140000.pcie: pcie1 no card, disable it (RST & CLK) Sat Feb 18 20:12:16 2023 kern.info kernel: [ 1.081322] mt7621-pci 1e140000.pcie: PCIE0 enabled Sat Feb 18 20:12:16 2023 kern.info kernel: [ 1.090980] mt7621-pci 1e140000.pcie: PCIE2 enabled

On both devices, both radios work fine and can be connected to.

I could in addition look at Netgear WAX202, Netgear R6260, Cudy WR2100 and D-Link Dir 860L B1, if there is the necessity to do so?

Djfe commented 1 year ago

Great choice to build it yourself :) I think, this doesn't affect wifi6 devices. only specific wifi chips with ac wifi (MT7603 and MT7612). But I'll ask over at OpenWRT, when I submit the PR.

I'll give you a patch later for testing the new assignment of pcie ports :)

rotanid commented 1 year ago

@Djfe it would be nice if you could try to treat people who know some stuff (see their github history) not like complete beginners, at least two people got that impression already. thanks in advance and of course for your help with openwrt/gluon :)

Djfe commented 1 year ago

I'm kinda dying of embarrassment right now 🤦‍♂️. Especially since it happened two or three times, exactly because I didn't look at their history enough. I know my behavior probably was insulting, but I didn't mean it to be. I'm sorry this happened. Thanks for letting me know though, so I can correct my behavior.

Anyways back to topic :) Here's the patch I promised: https://github.com/Djfe/openwrt/commit/0592a12476fb9aeea42dafecc9b2ff8c28cc548d.patch

@mbaumga I already added your name and mail to the commit message since I wanted to give you credit. Unless you don't want to be credited.

DragonBluep commented 1 year ago

@mbaumga FYI, the mac address of the mt7603e has gone wrong. And I guess the tx power is also very weak now.

Sat Feb 18 20:22:00 2023 kern.info kernel: [   14.281112] mt7603e 0000:02:00.0: Invalid MAC address, using random address 66:f9:ba:b1:06:ac
Djfe commented 1 year ago

That should be fixed once mediatek,mtd-eeprom (and everything else) is applied to the correct pcie port again, right?

FYI, the mac address of the mt7603e has gone wrong. And I guess the tx power is also very weak now.

I didn't know enough about this, so thank you for clarifying how the issue actually manifests. So there are actually fallbacks in place for undeclared pcie ports and 2.4GHz still kinda works but it's definitely not good.

Are only these three devices affected? I looked it up, they are all based on the same board (and have the same fcc id) We don't have access to a WNDR3700v5 but 2/3 might be enough in this case(?) Are there further devices that need to be looked at?

DragonBluep commented 1 year ago

That should be fixed once mediatek,mtd-eeprom (and everything else) is applied to the correct pcie port again, right?

Yep, just need this tiny fix.

Are only these three devices affected? I looked it up, they are all based on the same board (and have the same fcc id)

I'm not sure. Maybe there are more. I found them when I searched for pcie0 no card and pcie1 no card in the OpenWrt forum.

We don't have access to a WNDR3700v5 but 2/3 might be enough in this case(?)

Should be enough, there is a kernel log about wndr3700v5 shows that PCIE1 no card, disable it(RST&CLK) https://wikidevi.wi-cat.ru/Netgear_WNDR3700v5

Are there further devices that need to be looked at?

The 5.15 kernel is now the default. If there is a problem with other devices, we will soon get a report from the user.

BTW, there are another two fixes related to Netgear series devices. https://github.com/openwrt/openwrt/commit/c46584ab302f0dd9b472aef77c2af163f9719379 (not sure if R6220 is needed) https://github.com/openwrt/openwrt/commit/748f7f1b9c0ccb09840e954fbc405a3eb5187634 (can be backported to stable branches If necessary)

Djfe commented 1 year ago

The second patch is nice! A backport sounds great tbh. Maybe with 22.03.04?

About the first patch: I could use the stock rom log from the openwrt wiki for modifying the fixed partitions layout into a sercomm one.

Creating 22 MTD partitions on "MT7621-NAND":
0x000000000000-0x000000100000 : "Bootloader"
0x000000100000-0x000000200000 : "SC PID"
0x000000200000-0x000000600000 : "Kernel"
0x000000600000-0x000002200000 : "Rootfs"
0x000002200000-0x000002400000 : "English UI"
0x000002400000-0x000002600000 : "ML1"
0x000002600000-0x000002800000 : "ML2"
0x000002800000-0x000002a00000 : "ML3"
0x000002a00000-0x000002c00000 : "ML4"
0x000002c00000-0x000002e00000 : "ML5"
0x000002e00000-0x000002f00000 : "Factory"
0x000002f00000-0x000003000000 : "SC Private Data"
0x000003000000-0x000003200000 : "POT"
0x000003200000-0x000003400000 : "Traffic Meter"
0x000003400000-0x000003600000 : "DPF"
0x000003600000-0x000003800000 : "SC Nvram"
0x000003800000-0x000003a00000 : "Ralink Nvram"
0x000003a00000-0x000003c00000 : "Ralink Reserved"
0x000003c00000-0x000003e00000 : "ML6"
0x000003e00000-0x000004000000 : "Upgrade Flag"
0x000004000000-0x000004200000 : "Reserved Block3"
0x000004200000-0x000007e00000 : "Reserved Block4"

https://github.com/openwrt/openwrt/commit/bd783fd60a5f9513aa405437efff55fe29cd89c2

"fixed-partitions" is used if the partition map is not found or corrupted.

I'm going to verify some things first:

It appears that sercomm uses mtd1 for dynamic partitions. sometimes it's called dynamic partition map like on the Sercomm S3 https://openwrt.org/inbox/toh/sercomm/s3 or SC_PART_MAP like on the R6260 https://openwrt.org/toh/netgear/r6260

Sercomm ayx devices seem to call the partition SC PID. Is that some kind of abbreviation for a partition table? (nope, see below why)

From the bootloader log of a R6700 (Sercomm bzv): SC_DEBUG: Nand Partition Table Magic Found at 100000. (There is no OS log in the Wiki)

Let's look at the logs of our devices next to find something similar: From the bootloader of WAC104:

Env addr : 0x100000
.*** Warning - bad CRC, using default environment

So the bootloader stores it's environment in there: SC PID. A Partition table becomes less likely

From the complete log of a R6800 (Sercomm bzv):

Go to read Magic at 100000
page: 200
SC_DEBUG: NAND Partition Table Magic Fount at 100000.
page: 201
part 0,real_offset 00000000, real_length 00100000
page: 201
part 1,real_offset 00100000, real_length 00100000
page: 201
part 2,real_offset 00200000, real_length 00400000
page: 201
part 3,real_offset 00600000, real_length 02800000
page: 201
part 4,real_offset 02e00000, real_length 00200000
page: 201
part 5,real_offset 03000000, real_length 00200000
page: 201
part 6,real_offset 03200000, real_length 00200000
page: 201
part 7,real_offset 03400000, real_length 00200000
page: 201
part 8,real_offset 03600000, real_length 00200000
page: 201
part 9,real_offset 03800000, real_length 00200000
page: 201
part 10,real_offset 03a00000, real_length 00200000
page: 201
part 11,real_offset 03c00000, real_length 00200000
page: 201
part 12,real_offset 03e00000, real_length 00200000
page: 201
part 13,real_offset 04000000, real_length 00200000
page: 201
part 14,real_offset 04200000, real_length 00200000
page: 201
part 15,real_offset 04400000, real_length 00200000
page: 201
part 16,real_offset 04600000, real_length 00200000
page: 201
part 17,real_offset 04800000, real_length 00200000
page: 201
part 18,real_offset 04a00000, real_length 00200000
page: 201
part 19,real_offset 04c00000, real_length 00200000
page: 201
part 20,real_offset 04e00000, real_length 00200000
page: 201
part 21,real_offset 05000000, real_length 00200000
page: 201
part 22,real_offset 05200000, real_length 00200000
page: 201
part 23,real_offset 05400000, real_length 00200000
page: 201
part 24,real_offset 05600000, real_length 00200000
page: 201
part 25,real_offset 05800000, real_length 00200000
page: 201
part 26,real_offset 05a00000, real_length 00200000
page: 201
part 27,real_offset 05c00000, real_length 00200000
page: 201
part 28,real_offset 05e00000, real_length 02180000
Creating 29 MTD partitions on "MT7621-NAND":
0x000000000000-0x000000100000 : "Bootloader"
0x000000100000-0x000000200000 : "SC_PART_MAP"
0x000000200000-0x000000600000 : "Kernel"
0x000000600000-0x000002e00000 : "Rootfs"
0x000002e00000-0x000003000000 : "English UI"
0x000003000000-0x000003200000 : "ML1"
0x000003200000-0x000003400000 : "ML2"
0x000003400000-0x000003600000 : "ML3"
0x000003600000-0x000003800000 : "ML4"
0x000003800000-0x000003a00000 : "ML5"
0x000003a00000-0x000003c00000 : "ML6"
0x000003c00000-0x000003e00000 : "ML7"
0x000003e00000-0x000004000000 : "ML8"
0x000004000000-0x000004200000 : "ML9"
0x000004200000-0x000004400000 : "ML10"
0x000004400000-0x000004600000 : "ML11"
0x000004600000-0x000004800000 : "Factory"
0x000004800000-0x000004a00000 : "SC Private Data"
0x000004a00000-0x000004c00000 : "POT"
0x000004c00000-0x000004e00000 : "Traffic Meter"
0x000004e00000-0x000005000000 : "SC PID"
0x000005000000-0x000005200000 : "SC Nvram"
0x000005200000-0x000005400000 : "Ralink Nvram"
0x000005400000-0x000005600000 : "Reserved Block1"
0x000005600000-0x000005800000 : "Reserved Block2"
0x000005800000-0x000005a00000 : "Reserved Block3"
0x000005a00000-0x000005c00000 : "Reserved Block4"
0x000005c00000-0x000005e00000 : "Reserved Block5"
0x000005e00000-0x000007f80000 : "Reserved Block6"

Our devices don't have these sections in their logs of the stock rom, so I assume that they might be too old and sercomm partitions didn't exist, yet.

Devices not listed below either have no wiki entry or the wiki entry is missing the necessary log: Sercomm Boot Version 1.19.0 WAC104 Sercomm Boot Version 1.16.0 R6220

Sercomm Boot Version 1.2.0.0 R6260 Sercomm Boot Version 1.0.1.0 R6800 Sercomm Boot Version 1.3.0.0 WAC124

As you can see they also changed the numbering scheme of their boot stuff in between these generations. Our devices seem to be too old to be affected/nothing needs to be done.

Djfe commented 1 year ago

I could in addition look at Netgear WAX202, Netgear R6260, Cudy WR2100 and D-Link Dir 860L B1, if there is the necessity to do so?

You should test the Netgear R6260. The Cudy and D-Link not so much, but I won't stop you. It's good to know if all of these devices still run fine on the next kernel release. But they likely do. The MT7621 was tested a lot already (unlike MT7620 and MT7628)

Concerning the partition scheme I assume the WAX202 is too new, so they might've stopped using sercomm and switched to Mediatek nmbm instead for bad block handling (it's not a partition table). Though I can't say anything for sure since there is no log in the wiki only a tech data page. The WAX206 is another target (Mediatek ARM) and I couldn't find anything in the uploaded log file.

rotanid commented 1 year ago

@DragonBluep

BTW, there are another two fixes related to Netgear series devices. openwrt/openwrt@c46584a (not sure if R6220 is needed) openwrt/openwrt@748f7f1 (can be backported to stable branches If necessary)

but in the recently merged version of those fixes, the netgear devices were excluded as far as i can see?

Djfe commented 1 year ago

jep this issue was opened just so we can get the second half of this commit into openwrt https://github.com/openwrt/openwrt/commit/29447ea2c58a423031ec4ed827bffcd574596ade

They lacked testers for the affected netgear devices. They deferred the necessary change to the future. At some point people ought to notice broken wifi on the master branch if they own the device.

I only picked up on it right before merge, that they were still looking for testers.

edit: you didn't mean those two commits dragonbluep mentioned though, right? those were pushed to master as the version he linked (no netgear devices were removed from commit https://github.com/openwrt/openwrt/commit/c46584ab302f0dd9b472aef77c2af163f9719379 )

Djfe commented 1 year ago

it's explained here actually https://github.com/openwrt/openwrt/pull/11220#issuecomment-1435435140

rotanid commented 1 year ago

edit: you didn't mean those two commits dragonbluep mentioned though, right? those were pushed to master as the version he linked (no netgear devices were removed from commit openwrt/openwrt@c46584a )

ah, i mixed it up. ok ,then it would be nice to backport it :)

mbaumga commented 1 year ago

@Djfe

Anyways back to topic :) Here's the patch I promised: https://github.com/Djfe/openwrt/commit/0592a12476fb9aeea42dafecc9b2ff8c28cc548d.patch

So. I did build OpenWrt master branch today with the patch applied and here is the log output:

WAC104: https://paste.tecff.de/?d01518df29372bc5#nSm2MnpywGezUBMW3oOwuR2C48GhavCKBYKSObW3SiY=

2.4 and 5GHz Wifi are up and can be connected to.

R6220: https://paste.tecff.de/?cb90ea77787f55cd#1iyvg8ObZFaN+VoJmB3vXQllIvxlxjhXeJHRTb2Ml1g=

2.4 and 5GHz Wifi are up and can be connected to.

I already added your name and mail to the commit message since I wanted to give you credit. Unless you don't want to be credited.

Sure. I don't mind.

@DragonBluep:

FYI, the mac address of the mt7603e has gone wrong. And I guess the tx power is also very weak now.

Sat Feb 18 20:22:00 2023 kern.info kernel: [   14.281112] mt7603e 0000:02:00.0: Invalid MAC address, using random address 66:f9:ba:b1:06:ac

This seems to have vanished too, so reassigning the pcie port worked. :)

Djfe commented 1 year ago

Great, thanks for your quick help @mbaumga in getting this fix tested :) I just submitted the pull request here

Djfe commented 1 year ago

Actually I have to necro this thread, because people upgrading from release to snapshot noticed, that we didn't migrate old wifi-device paths to the new ones. https://github.com/openwrt/openwrt/pull/12198

@mbaumga may I ask you again to test another patch? This time one device is enough :)

The procedure would be: upgrade from openwrt release to master (with my patch applied) and send me /etc/config/wireless from before and after the upgrade. (so I can compare whether the migration was successful)

mbaumga commented 1 year ago

Done.

OpenWrt 23.02.3 after factory reset:

BusyBox v1.35.0 (2023-01-03 00:24:21 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 22.03.3, r20028-43d71ad93e
 -----------------------------------------------------
=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@OpenWrt:~# cat /etc/config/wireless 

config wifi-device 'radio0'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:01.0/0000:02:00.0'
    option channel '1'
    option band '2g'
    option htmode 'HT20'
    option disabled '1'

config wifi-iface 'default_radio0'
    option device 'radio0'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

config wifi-device 'radio1'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
    option channel '36'
    option band '5g'
    option htmode 'VHT80'
    option disabled '1'

config wifi-iface 'default_radio1'
    option device 'radio1'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

OpenWrt snapshot image without patch:


BusyBox v1.36.0 (2023-03-17 20:22:54 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt SNAPSHOT, r22297-1781e7408a
 -----------------------------------------------------
=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@OpenWrt:~# cat /etc/config/wireless 

config wifi-device 'radio0'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:01.0/0000:02:00.0'
    option channel '1'
    option band '2g'
    option htmode 'HT20'
    option disabled '1'

config wifi-iface 'default_radio0'
    option device 'radio0'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

config wifi-device 'radio1'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
    option channel '36'
    option band '5g'
    option htmode 'VHT80'
    option disabled '1'

config wifi-iface 'default_radio1'
    option device 'radio1'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

config wifi-device 'radio2'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:02.0/0000:02:00.0'
    option channel '1'
    option band '2g'
    option htmode 'HT20'
    option disabled '1'

config wifi-iface 'default_radio2'
    option device 'radio2'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

OpenWrt master with patch applied:


BusyBox v1.36.0 (2023-03-18 11:52:24 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt SNAPSHOT, r22302-e467856cb3
 -----------------------------------------------------
=== WARNING! =====================================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@OpenWrt:~# cat /etc/config/wireless 

config wifi-device 'radio0'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:02.0/0000:02:00.0'
    option channel '1'
    option band '2g'
    option htmode 'HT20'
    option disabled '1'

config wifi-iface 'default_radio0'
    option device 'radio0'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

config wifi-device 'radio1'
    option type 'mac80211'
    option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
    option channel '36'
    option band '5g'
    option htmode 'VHT80'
    option disabled '1'

config wifi-iface 'default_radio1'
    option device 'radio1'
    option network 'lan'
    option mode 'ap'
    option ssid 'OpenWrt'
    option encryption 'none'

Anything else you need? :)

Djfe commented 1 year ago

Lovely! Thanks a lot :) Nope that's all, have a nice weekend.