aparcar / openwrt

Staging tree of Paul Spooren
Other
8 stars 1 forks source link

FS#227 - VLAN support mismatch between preinit and default network config #574

Open aparcar opened 8 years ago

aparcar commented 8 years ago

acarlo:

PPPoE is broken on WRT1900ACS

Upgraded from Lede r578 to latest Lede r1814 and PPPOE doesn't work anymore altough the pppd version and PPPoE version are the same:

pppd debug log:

Plugin rp-pppoe.so loaded. RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7 Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name] Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name] Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name] Timeout waiting for PADO packets Unable to complete PPPoE Discovery Plugin rp-pppoe.so loaded. RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7 Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name] Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name] Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name]

While on the same hardware running LEDE r578, the PPPoE module works as expected:

Plugin rp-pppoe.so loaded. RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7 Send PPPOE Discovery V1T1 PADI session 0x0 length 4 dst ff:ff:ff:ff:ff:ff src c2:56:27:ca:d7:d4 [service-name] Recv PPPOE Discovery V1T1 PADO session 0x0 length 40 dst c2:56:27:ca:d7:d4 src a0:f3:e4:34:d8:21 [service-name] [AC-name acc-aln1.hac] [AC-cookie 75 58 37 a5 ba 3c e4 a5 2a 61 bb 23 92 5c 1b dc] Send PPPOE Discovery V1T1 PADR session 0x0 length 24 dst a0:f3:e4:34:d8:21 src c2:56:27:ca:d7:d4 [service-name] [AC-cookie 75 58 37 a5 ba 3c e4 a5 2a 61 bb 23 92 5c 1b dc] Recv PPPOE Discovery V1T1 PADS session 0x30b length 4 dst c2:56:27:ca:d7:d4 src a0:f3:e4:34:d8:21 [service-name] PADS: Service-Name: '' PPP session is 779 Connected to a0:f3:e4:34:d8:21 via interface eth0 using channel 2 Using interface pppoe-wan Connect: pppoe-wan <--> eth0 sent [LCP ConfReq id=0x1 <mru 1492> <magic 0xc6952556>] rcvd [LCP ConfReq id=0x66 <mru 1492> <magic 0x4cc73648>] sent [LCP ConfAck id=0x66 <mru 1492> <magic 0x4cc73648>] rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0xc6952556>] sent [LCP EchoReq id=0x0 magic=0xc6952556] rcvd [CHAP Challenge id=0x1 <7131a44524d1de8f1cd1061cac6d8c071d8bfe7351bc4ea7bd08f56684428475f229ba177a192696ebab32>, name = "acc-aln1.hac"] sent [CHAP Response id=0x1 <4bb1a418b298790b128ad4d7ef3109ad>, name = "bthomehub@btbroadband.com"] rcvd [LCP EchoRep id=0x0 magic=0x4cc73648] rcvd [CHAP Success id=0x1 "CHAP authentication success"] CHAP authentication succeeded: CHAP authentication success CHAP authentication succeeded peer from calling number A0:F3:E4:34:D8:21 authorized sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>] sent [IPV6CP ConfReq id=0x1 ] rcvd [IPV6CP ConfReq id=0x7b ] sent [IPV6CP ConfAck id=0x7b ] rcvd [IPCP ConfReq id=0x38 <addr 172.16.12.12>] sent [IPCP ConfAck id=0x38 <addr 172.16.12.12>] rcvd [IPCP ConfNak id=0x1 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>] sent [IPCP ConfReq id=0x2 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>] rcvd [IPV6CP ConfAck id=0x1 ] local LL address fe80::c595:37d1:3987:1929 remote LL address fe80::0221:05ff:feb4:8824 Script /lib/netifd/ppp-up started (pid 2646) rcvd [IPCP ConfAck id=0x2 <addr 81.146.2.155> <ms-dns1 81.139.57.100> <ms-dns2 81.139.56.100>] local IP address 81.146.2.155 remote IP address 172.16.12.12 primary DNS address 81.139.57.100 secondary DNS address 81.139.56.100 ppp.log secondary DNS address 81.139.56.100 Script /lib/netifd/ppp-up started (pid 2653) Script /lib/netifd/ppp-up finished (pid 2646), status = 0x9 Script /lib/netifd/ppp-up finished (pid 2653), status = 0x9

aparcar commented 8 years ago

mkresin:

Would you please attach/paste your /e/c/network! Do you have to vlan tag your PPPoE traffic? Which ISP?

Are you able to compile your own image? It would be helpful if you can do a git bisect to find the commit which broke PPPoE on your WRT1900ACS.

aparcar commented 8 years ago

acarlo:

This is the intrface config for the pppoe traffic:

config interface 'wan' option ifname 'eth0' option proto 'pppoe' option username 'bthomehub@btbroadband.com' option password 'bt' option timeout '10'

I use the same config for the workign and not working LEDE build. The provider is BT in UK.

Yes I do build my own image but I am not familiar with git bisect, I will check how to use it and come back on this point.

aparcar commented 8 years ago

mkresin:

Would you please provide your complete /e/c/network!

aparcar commented 8 years ago

acarlo:

Full network:

root@OpenWrt:/etc/config# cat network

config interface 'loopback' option ifname 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0'

config globals 'globals' option ula_prefix 'fd7b:f926:6250::/48'

config interface 'lan' option type 'bridge' option proto 'static' option netmask '255.255.255.0' option ip6assign '60' option ipaddr '192.168.20.254' option igmp_snooping '1' option _orig_ifname 'eth1 wlan0 wlan1' option _orig_bridge 'true' option ifname 'eth1 eth2'

config interface 'wan' option ifname 'eth0' option proto 'pppoe' option username 'bthomehub@btbroadband.com' option password 'bt' option timeout '10'

config interface 'wan6' option ifname 'eth0' option proto 'dhcpv6'

config interface 'iptv' option ifname 'eth0' option proto 'static' option ipaddr '10.22.22.1' option netmask '255.255.255.0'

config interface 'vpn0' option ifname 'tun0' option proto 'none' option auto '1'

config interface 'guest' option _orig_ifname 'radio1.network2' option _orig_bridge 'false' option proto 'static' option ipaddr '192.168.99.254' option netmask '255.255.255.0'

root@OpenWrt:/etc/config#

aparcar commented 8 years ago

acarlo:

Just found this topic on Openwrt board:

https://forum.openwrt.org/viewtopic.php?pid=335168#p335168

From the topic: (BTW: R1297 is running ok, so must be a change of the last week) edit 1: This seems to be the only change to the PPP package: https://git.lede-project.org/?p=source. … 344006173) edit 2: just reverted that change and rebuild the setup, still not working so it must be collateral damage from something else.

aparcar commented 8 years ago

mkresin:

Nice finding.

The as working reported version r1297 has the git commit hash 4e8c6f340751c66a602b98b727af28b2a9004313

The report in the forum is from 2016-08-20. The last commit of this date has the commit hash 35be9284668d19a565d354a33febb508b0e28131 (r1396).

First step would be to test these both commits, to make sure that r1297 works and r1396 is really broken.

$ git checkout master $ git checkout 4e8c6f340751c66a602b98b727af28b2a9004313 $ make dirclean $ make menuconfig $ make

the same with 35be9284668d19a565d354a33febb508b0e28131 If you have a good and a bad version you can use git bisect (git bisect start ): $ git checkout master $ git bisect start 35be9284668d19a565d354a33febb508b0e28131 4e8c6f340751c66a602b98b727af28b2a9004313 $ make dirclean $ make menuconfig $ make $ git bisect good OR git bisect bad $ make dirclean $ make menuconfig $ make $ git bisect good OR git bisect bad ... In the end, git bisect will tell you which commit introduced the regression.
aparcar commented 8 years ago

acarlo:

here you go:

carlo@ubuntu:~/source$ git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [c18edcec4500008a1dabf0b017322eb23b059c58] base-files: add preinit ifname detection based on board.json carlo@ubuntu:~/source$

Important: while testing the builds, I had some of them that would build without errors but didn't let the router to boot, so I marked them as bad.

aparcar commented 8 years ago

mkresin:

Good job!

Would you please apply the attached patch on top of the latest git and check if the issue is gone. The patch is not a fix! It's just to confirm that [[https://git.lede-project.org/c18edcec4500008a1dabf0b017322eb23b059c58|c18edcec4500008a1dabf0b017322eb23b059c58]] is really the cause of your issue.

$ git checkout master $ patch -p1 < fs227_confirmation.patch

to confirm that the patch is applied successfully

$ git diff

build and test image

$ make dirclean $ make menuconfig $ make

aparcar commented 8 years ago

acarlo:

it works :) I applied the patch to this build: LEDE Reboot (HEAD, r1845) and got finally the pppoe connection back :)

Thanks for your help, hopefully we will get a permanent fix in the trunk (soon) :)

aparcar commented 8 years ago

mkresin:

Please attach the output of the following commands from a working image and from a not working image:

$ dmesg $ swconfig dev switch0 show $ cat /etc/board.json $ cat /etc/config/network $ for iface in $(ls /sys/class/net/);do echo "${iface}: $(cat /sys/class/net/${iface}/carrier)";done

PLease do not keep your settings during test.

aparcar commented 8 years ago

Johnnysl:

FYI: That message on the openwrt forum was mine. I could eventually trace it to the changes done to enable vlans on the Switch by default, while my config didn't really use those. After wiping my /etc/config/network, rebooting, reconfiguring from scratch based on switch vlans, everything started to work again. PPPOE is still quite slow though, taking often multiple attemps in a couple of minutes to log in.

aparcar commented 8 years ago

mkresin:

According to the code, the vlans were set up already before c18edcec4500008a1dabf0b017322eb23b059c58.

But since commit c18edcec4500008a1dabf0b017322eb23b059c58 vlans are enabled in failsafe/preinit as well. This might cause some unexpected side effects on mvebu boards, since they never had support for failsafe (which is really bad).

Due to your remark regarding a changed vlan config, I've updated the post where I'm asking for some output.

As a general not, please report bugs here and do not hide them in the forum. To my knowledge no dev is monitoring the forum for bugs reports

aparcar commented 8 years ago

Johnnysl:

Usually i like to understand if it is me, or a bug. Don't want to clutter this page with all issues i run into. Due to nobody complaining, and me "fixing" it with a reconfigure of /etc/config/network i assumed it was not a real bug...

aparcar commented 8 years ago

acarlo:

attached there is the commands' output for the same build (working and not working version)

aparcar commented 8 years ago

mkresin:

Usually i like to understand if it is me, or a bug. Don't want to clutter this page with all issues i run into. Due to nobody complaining, and me "fixing" it with a reconfigure of /etc/config/network i assumed it was not a real bug...

Thanks for that! Indeed, that is the way to go and not to spam the bugtracker with support requests.

attached there is the commands' output for the same build (working and not working version)

Okay, now I can see the real issue.

It's a bug in "set up vlans in preinit/failsafe" which is revealed by a config that differs from the default network config.

During preinit vlan support is enabled ("enable_vlan: 1" in swconfig output) since it is (now) the default for the board, but the vlan support is not disabled afterwards. Since your /e/c/network misses the vlan part, it can neither disable vlan support nor setup the desired vlan config on it's own.

That your lan interfaces are working is more luck than expected.

For now, the best is to disable vlan support after boot. Everything should work after that using an unmodified LEDE image:

swconfig dev switch0 set enable_vlan 0 swconfig dev switch0 set apply

I will try to get in contact with the author of this change to discuss the issue. I'm not interested to commit a fix which possibly introduces a new bug.

aparcar commented 6 years ago

psyborg:

your ticket break spacing on 1280x800 screens. also i don't see a point in using tags...