8devices / carambola2

Carambola 2 - an AR9331/SoC based PCB
GNU General Public License v2.0
56 stars 43 forks source link

br-lan: received packet on eth1 with own address as source #64

Closed 1am closed 3 years ago

1am commented 8 years ago

Hi

I've built a few devices using Carambola2 devices and they were functioning correctly for around 6 months now. Since a few days one of the boards went offline and started reporting a strange error in dmesg and over UART connection. All of this happens after booting:

...
[   25.259603] device eth0 entered promiscuous mode
[   25.282094] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
[   25.350713] device eth1 entered promiscuous mode
[   25.353982] br-lan: port 2(eth1) entered forwarding state
[   25.359392] br-lan: port 2(eth1) entered forwarding state
[   25.365573] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   26.214972] br-lan: port 2(eth1) entered disabled state
[   27.606297] eth1: link up (100Mbps/Full duplex)
[   27.609445] br-lan: port 2(eth1) entered forwarding state
[   27.614874] br-lan: port 2(eth1) entered forwarding state
[   27.620361] br-lan: received packet on eth1 with own address as source address
[   27.629748] br-lan: received packet on eth1 with own address as source address
[   27.640788] br-lan: received packet on eth1 with own address as source address
[   27.648195] br-lan: received packet on eth1 with own address as source address
[   27.715138] br-lan: received packet on eth1 with own address as source address
[   27.891001] br-lan: received packet on eth1 with own address as source address
[   27.898185] br-lan: received packet on eth1 with own address as source address
[   27.909384] br-lan: received packet on eth1 with own address as source address
[   28.055000] br-lan: received packet on eth1 with own address as source address
[   28.060822] br-lan: received packet on eth1 with own address as source address
[   29.614879] br-lan: port 2(eth1) entered forwarding state
[   33.064921] net_ratelimit: 22 callbacks suppressed
[   33.068264] br-lan: received packet on eth1 with own address as source address
[   33.075542] br-lan: received packet on eth1 with own address as source address
[   36.072500] br-lan: received packet on eth1 with own address as source address
[   36.078699] br-lan: received packet on eth1 with own address as source address
[   37.064920] br-lan: received packet on eth1 with own address as source address
[   37.074904] br-lan: received packet on eth1 with own address as source address
[   37.505110] br-lan: received packet on eth1 with own address as source address
[   38.064909] br-lan: received packet on eth1 with own address as source address
[   38.075145] br-lan: received packet on eth1 with own address as source address
[   41.076272] br-lan: received packet on eth1 with own address as source address
[   41.082294] br-lan: received packet on eth1 with own address as source address
[   42.074944] br-lan: received packet on eth1 with own address as source address
[   42.080765] br-lan: received packet on eth1 with own address as source address
[   43.074923] br-lan: received packet on eth1 with own address as source address
[   43.080738] br-lan: received packet on eth1 with own address as source address
[   58.245055] br-lan: received packet on eth1 with own address as source address
[   58.884904] br-lan: received packet on eth1 with own address as source address
[   59.835040] br-lan: received packet on eth1 with own address as source address
[   60.135030] br-lan: received packet on eth1 with own address as source address
[   60.195017] br-lan: received packet on eth1 with own address as source address
[   61.835034] br-lan: received packet on eth1 with own address as source address
[   65.085063] br-lan: received packet on eth1 with own address as source address
[   66.545075] br-lan: received packet on eth1 with own address as source address
[   79.106293] br-lan: received packet on eth1 with own address as source address
[   79.112321] br-lan: received packet on eth1 with own address as source address
[   79.794856] random: nonblocking pool is initialized
[   80.104944] br-lan: received packet on eth1 with own address as source address
[   80.110758] br-lan: received packet on eth1 with own address as source address
[   81.104923] br-lan: received packet on eth1 with own address as source address

I've inspected the board and found no hardware issues so far. Other devices with same configuration (including same IP address) are working. I'm sure no two devices are trying to use the same IP address as I've connected only one of them at once. Looking through internet I haven't found much leads except that it can happen when there's a loop connection on the local network... but that would not really be the case if for two devices with identical configuration one would work ok and the other not. My network configuration for all devices is the following:

root@dev:/tmp# cat /etc/config/network 

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fda4:17a0:6a81::/48'

config interface 'lan'
        option type 'bridge'
        option proto 'static'
        option netmask '255.255.255.0'
        option ifname 'eth0 eth1'
        option gateway '192.168.1.2'
        option ipaddr '192.168.1.213'
        list dns '192.168.1.1'

And just of them stopped working while no changes were made in the network. Did someone maybe experience such issues before?

mantas-p commented 8 years ago

Hi,

Well that looks like a network loop or duplicate MAC address. Please post output of 'ifconfig' and 'ps' commands. Also describe your network topology - where eth0 and eth1 are connected, do you use WiFi, etc.. Tcpdump capture of br-lan interface could be helpful too.

1am commented 8 years ago

Hi,

The output is following:

# ifconfig
br-lan    Link encap:Ethernet  HWaddr C4:93:00:03:B0:99
          inet addr:192.168.1.213  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::c693:ff:fe03:b099%2005042360/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:252 errors:0 dropped:0 overruns:0 frame:0
          TX packets:264 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:14726 (14.3 KiB)  TX bytes:16875 (16.4 KiB)

eth0      Link encap:Ethernet  HWaddr C4:93:00:03:B0:99
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:5

eth1      Link encap:Ethernet  HWaddr C4:93:00:03:B0:9A
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:252 errors:0 dropped:0 overruns:0 frame:0
          TX packets:253 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:18254 (17.8 KiB)  TX bytes:14442 (14.1 KiB)
          Interrupt:4

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1%2005044952/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:572 errors:0 dropped:0 overruns:0 frame:0
          TX packets:572 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:50992 (49.7 KiB)  TX bytes:50992 (49.7 KiB)
# ps
  PID USER       VSZ STAT COMMAND
    1 root      1536 S    /sbin/procd
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    5 root         0 SW<  [kworker/0:0H]
    6 root         0 SW   [kworker/u2:0]
    7 root         0 SW<  [khelper]
    8 root         0 SW   [kworker/u2:1]
   29 root         0 SW<  [writeback]
   68 root         0 SW<  [crypto]
   69 root         0 SW<  [bioset]
   71 root         0 SW<  [kblockd]
  103 root         0 SW   [kswapd0]
  104 root         0 SW   [kworker/0:1]
  152 root         0 SW   [fsnotify_mark]
  187 root         0 SW   [spi0]
  290 root         0 SW<  [ipv6_addrconf]
  296 root         0 SW<  [deferwq]
  299 root         0 SW<  [kworker/0:1H]
  335 root         0 SW   [kworker/0:2]
  364 root         0 SWN  [jffs2_gcd_mtd5]
  453 root      1180 S    /sbin/ubusd
  456 root      1192 S    /bin/ash --login
  860 root         0 SW<  [cfg80211]
  965 root      1180 S    /sbin/logd -S 16
  974 root      1444 S    /sbin/rpcd
 1007 root      1628 S    /sbin/netifd
 1019 root      1284 S    /usr/sbin/odhcpd
 1060 root      1048 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p
 1074 root      1288 S    /usr/sbin/uhttpd -f -h /www -r g_v03 -x /cgi-b
 1083 nobody    1540 S    avahi-daemon: running [gv03.local]
 1091 root      1408 S    /usr/bin/rsync --daemon --no-detach
 1170 root      1192 S    /usr/sbin/ntpd -n -S /usr/sbin/ntpd-hotplug -p 0.ope
 1208 nobody    1044 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf -k -x /va
 1272 root      1192 R    ps

My network topology is quite simple:

Carambola2 device > switch > router with internet connection.

I don't use WiFi on Carambola2

mantas-p commented 8 years ago

So you have eth1 port connected to switch and eth0 left unconnected when problem happens, right? Or eth0 is also connected to switch?

1am commented 8 years ago

Only ETH0 is physically connected to anything, in my case a switch. ETH1 is not exposed outside of Carambola2 module.

mantas-p commented 8 years ago

ifconfig log you sent shows that eth1 is getting packets, probably what you call ETH0 is eth1 interface in linux. You can try this network config:

cat /etc/config/network 

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fda4:17a0:6a81::/48'

config interface 'lan'
        option type 'bridge'
        option proto 'static'
        option netmask '255.255.255.0'
        option ifname 'eth1'
        option gateway '192.168.1.2'
        option ipaddr '192.168.1.213'
        list dns '192.168.1.1'

If this doesn't help, try connecting Carambola2 directly to PC and see if problem persists.

Other way to debug would be to capture packets between switch and Carambola2. Since problem happens during boot capture should be done externally - by placing capture device (Carambola2 dev board or PC) with 2 bridged ETH ports between switch and Carambola2.

1am commented 8 years ago

Hi Mantas-p

The thing is that this issue happens once in a while - for ~200 devices we've set up with the same hardware and flashed + configured the image so far only one has this issue.

This would be our second instance of the situation. It also doesn't happen repeatedly because we test the devices in two passes: first of tests if the Carambola2 is accessible overt ETH and second one is done after assembly also repeats this test along with some more. For this board first test passed and some time later second one fails. The same thing happened with the device I've written in the original post - it was working fine until it stopped and never started working again. Flashing any fresh system doesn't help.

I've also tried with the network configuration you've pasted and the results are the same.

1am commented 8 years ago

I'm also attaching a pcap file on which you can see some brief activity from the carambola device and router and pings with no response.

carambola2.pcapng.zip

mantas-p commented 8 years ago

Hi,

From packet capture: Carambola2 is able to send multicast packets but doesn't respond to ping requests. It would be interesting to also capture on Carambola2's eth1 port. For that you would need to have tcpdump installed in you image.

Have you tried flashing reference firmware (http://pkg.8devices.com/carambola2/v2.4/openwrt-ar71xx-generic-carambola2-squashfs-sysupgrade.bin) to defective device? Does it still print errors about receiving packets from own address? Does Ethernet work in bootloader on defective device?

1am commented 8 years ago

We've tried flashing the reference firmware and it succeeded in bootloader once over TFTP but the same error persists on the new image. Sometimes we are able to connect to the Carambola device but after short while - around 30 seconds - it returns to the same error. We've got lucky on the attempt of flashing but later on not. A colleague sent me a link to this OpenWRT change which could maybe be related? https://dev.openwrt.org/changeset/46821

mantas-p commented 8 years ago

Hi,

Please send a complete boot log from serial console (including bootloader messages). I need logs from both working and defective devices to compare what might be different. Please flash the same firmware on both devices before taking logs.

OpenWRT change seems to be related to Wifi radio, so not relevant in this case.

1am commented 8 years ago

Hi,

I'm attaching the boot logs from two devices in the same network setup. Comparing the results I see no difference except that the broken one starts throwing br-lan: received packet on eth1 with own address as source address after boot and doesn't get an IP address.

carambola2_normal.txt carambola2_broken.txt

valinskas commented 8 years ago

Hi,

can you try doing this, while connected via UART/serial console:

# ifconfig eth0 down
# ifconfig eth0 hw ether 70:12:00:00:00:00
# ifconfig eth0 up

# ifconfig eth1 down
# ifconfig eth1 hw ether 70:12:00:00:00:01
# ifconfig eth1 up

Result:

# ip link
...
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 70:12:00:00:00:00 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 70:12:00:00:00:01 brd ff:ff:ff:ff:ff:ff
  ...

This will change MAC addresses on ethX interfaces. Does problem go away ?

1am commented 8 years ago

Hi

Thank you. We've tried and ended up with the following results:

root@carambola_broken:/# /etc/init.d/network restart
[  290.324718] device eth0 left promiscuous mode
[  290.327690] br-lan: port 1(eth0) entered disabled state
[  290.335903] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  290.341388] device eth1 left promiscuous mode
[  290.344836] br-lan: port 2(eth1) entered disabled state
[  290.367155] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
root@carambola_broken:/# [  294.259364] device eth0 entered promiscuous mode
[  294.281542] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
[  294.332809] device eth1 entered promiscuous mode
[  295.775981] eth1: link up (100Mbps/Full duplex)
[  295.779129] br-lan: port 2(eth1) entered forwarding state
[  295.784557] br-lan: port 2(eth1) entered forwarding state
[  295.790030] br-lan: received packet on eth1 with own address as source address
[  295.799459] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[  295.824922] br-lan: received packet on eth1 with own address as source address
[  296.030087] br-lan: received packet on eth1 with own address as source address
[  296.374711] br-lan: received packet on eth1 with own address as source address
[  296.380656] IPv6: br-lan: IPv6 duplicate address fe80::7212:ff:fe00:0 detected!
[  296.404745] br-lan: received packet on eth1 with own address as source address
[  297.324735] br-lan: received packet on eth1 with own address as source address
[  297.784508] br-lan: port 2(eth1) entered forwarding state
[  299.040387] br-lan: received packet on eth1 with own address as source address
[  302.051075] br-lan: received packet on eth1 with own address as source address
root@carambola_broken:/# ifconfig
br-lan    Link encap:Ethernet  HWaddr 70:12:00:00:00:00  
          inet6 addr: fe80::7212:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1382 (1.3 KiB)  TX bytes:1434 (1.4 KiB)
eth0      Link encap:Ethernet  HWaddr 70:12:00:00:00:00  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:5 
eth1      Link encap:Ethernet  HWaddr 70:12:00:00:00:01  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:29 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:6640 (6.4 KiB)  TX bytes:6940 (6.7 KiB)
          Interrupt:4 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:176 errors:0 dropped:0 overruns:0 frame:0
          TX packets:176 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:12024 (11.7 KiB)  TX bytes:12024 (11.7 KiB)
root@carambola_broken:/# [  305.060730] br-lan: received packet on eth1 with own address as source address
root@carambola_broken:/# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel master br-lan state DOWN mode DEFAULT group default qlen 1000
    link/ether 70:12:00:00:00:00 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-lan state UP mode DEFAULT group default qlen 1000
    link/ether 70:12:00:00:00:01 brd ff:ff:ff:ff:ff:ff
4: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c4:93:00:04:e5:26 brd ff:ff:ff:ff:ff:ff
8: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 70:12:00:00:00:00 brd ff:ff:ff:ff:ff:ff

What is surprising is the [ 296.380656] IPv6: br-lan: IPv6 duplicate address fe80::7212:ff:fe00:0 detected! messag. We're using IPv4 and have no IP address conflicts (at the time of writing assigned via DHCP) so I don't expect any of them in IPv6.

pepe2k commented 8 years ago

@1am, I have seen similar problems on Qualcomm/Atheros based devices, with damaged built-in Ethernet controller/s and/or transformers (ex. after a storm).

Cheers, Piotr

valinskas commented 8 years ago

Message:

[ 296.380656] IPv6: br-lan: IPv6 duplicate address fe80::7212:ff:fe00:0 

fe80::7212:ff:fe00:0 is a automatically generated local link layer address and usually it is generated based on interface MAC address, some randomness if IPv6 privacy extentions are enabled. When interface goes up kernel does DAD(duplication address detection) and thus this message ...

It seems when kernel sends DAD request on br-lan interface. br-lan consists of two ports eth0(state=down), eth1(state=up). So kernel's bridge code skips eth0 (does not send packet) and it should send on eth1 only .... and apparently the same packet is looped back on eth1 interface ...

No wander such messages are seen:

[  295.824922] br-lan: received packet on eth1 with own address as source address
[  296.030087] br-lan: received packet on eth1 with own address as source address
[  296.374711] br-lan: received packet on eth1 with own address as source address

I don't understand how it can happen. It might be a faulty unit like @pepe2k said. I had never seen anything like this. Time to replace a faulty unit ?

bome commented 8 years ago

We've seen such problems on a few of our boards, too (though not sure it had always been exactly the same symptoms). In a few cases, it was sufficient to replace the Ethernet transformer, in one case the Carambola2 board seemed to not be 100% planar: resoldering it solved the problem.

1am commented 8 years ago

@pepe2k, @bome I've tried replacing the transformer for my ETH switch (luckily I don'thave ETH + transformer in one) but the results are the same. Carambola2 modules are also laying flat and mounted properly.

@valinskas The unit was not faulty, it worked normally and nothing bad (like storms) happened to it. In first occurrence the device was working and after a while it suddenly stopped with this error. With the latest instance of the error it was around 2-3 days of being off after initial setup before this error occurred. Carambola2 is not very easy nor cheap to replace and I'd really need to understand it's source when it happens for other devices from the current batch. So far it has been observed on 2 of few hundred Carambola2 based devices.

Hardware wise I'd say everything looks ok but for some reason the problem persists over system flashing. Is it possible that there's something funny happening in bootloader?

mantas-p commented 8 years ago

@1am, please contact 8devices support at: support@8devices.com to arrange replacement of defective modules. When contacting support, please send link to this ticket. Also please leave a note in shipment, saying "For Mantas P."

codehero commented 5 years ago

Ensure that your 3.3V power supply startup starts with no droop. I noticed the same error, and when I fixed my supply startup the PHY worked again.

valinskas commented 3 years ago

Closing issue, ≈2years  since last activity. Assuming issue was resolved.