Closed dwalkes closed 5 years ago
I have some problems with LLDP packets with TX2 28.3.
This problems is present when i use the systemd networkd. In nvidia SDK i can’t reproduce the problem, because the sdk use the NetwokManager instead of the systemd networkd.
The problem happens when I receive an LLDP packet. The packet is not showed in tcpdump and after that packet is received, the network driver don't generate more interrupts. check that with this:
check if you can receive any LLDP network packet:
tcpdump -ni eth0 -e ether proto 0x88cc
[ 244.935449] Unsupported IOCTL call
[ 244.954384] Unsupported IOCTL call
[ 244.966121] device eth0 entered promiscuous mode
if you receive any LLDP packets you don't have this problem
at the same time you can check the interrupts on netwok driver:
egrep ether_qos /proc/interrupts
when the tegra receive an LLDP packet the interrupts stops and it don't receive anything
the network can be reseted with:
ip link set eth0 down; ip link set eth0 up
I think there are something used by systemd networkd that is not present on the ether_qos network driver. To fix that, i disable LLDP on systemd networkd in /etc/systemd/network/eth.network
[Network]
# fix LLDP on TX2
LLDP=0
Thanks for the suggestion @tzopik When I try
tcpdump -ni eth0 -e ether proto 0x88cc
I get
[ 197.254281] device eth0 entered promiscuous mode
[ 197.259029] audit: type=1700 audit(1563200454.016:4): dev=eth0 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[ 197.270552] audit: type=1300 audit(1563200454.016:4): arch=c00000b7 syscall=208 success=yes exit=0 a0=3 a1=107 a2=1 a3=7fc84ec540 items=0 ppid=4289 pid=5050 auid=4294967295 uid=0 gid=0 eu)
[ 197.297531] audit: type=1327 audit(1563200454.016:4): proctitle=74637064756D70002D6E690065746830002D650065746865720070726F746F00307838386363
After the event occurs the interrupts keep advancing on common_irq
root@jetson-tx2:~# egrep ether_qos /proc/interrupts
41: 130 0 0 0 GICv2 226 Level ether_qos.common_irq
43: 469 0 0 0 GICv2 222 Level 2490000.ether_qos.rx0
44: 223 0 0 0 GICv2 218 Level 2490000.ether_qos.tx0
root@jetson-tx2:~# egrep ether_qos /proc/interrupts
41: 130 0 0 0 GICv2 226 Level ether_qos.common_irq
43: 469 0 0 0 GICv2 222 Level 2490000.ether_qos.rx0
44: 223 0 0 0 GICv2 218 Level 2490000.ether_qos.tx0
So it looks like this particular issue might be different than 28.3 but possibly related.
@tzopik, @dwalkes I tried LLDP=0 on meta-tegra/warrior and this seems to resolve this issue for me. Ethernet connection has been stable for 10 minutes+, and it recovers properly from link up/down. Previously, I could not get Ethernet to work at all when the device was connected to MikroTik or Ubiquiti routers.
Contents of /etc/systemd/network/eth.network I tested this on:
Name=eth*
[Network]
DHCP=v4
LLDP=0
[DHCPv4]
UseHostname=false
Thanks for the hint.
I haven't had a problem with the Ethernet interface coming up on warrior, but I was using ifupdown and the networking initscript instead of systemd-networkd. I did have to modify how ifupdown was invoking udhcpc to deal with the delay caused by spanning tree on my Ubiquity switch.
I've just switched over to systemd-networkd, and am still not seeing an issue with warrior - Ethernet was up over half an hour. I'm running both IPv4 and IPv6 (just link-local) on my local network. Are you all running any IPv6?
I tried LLDP=0 on meta-tegra/warrior and this seems to resolve this issue for me.
Agreed, I see the same behavior, LLDP=0 has gone for over an hour of successful pingtest on IPv4. My previous record was just over 10 minutes.
Are you all running any IPv6?
Only for test purposes related to this issue.
I've pushed a workaround to my meta-mender-community tegra layer for now, since mender is where the systemd networkd dependency is coming from.
I haven't been able to track down the actual root cause yet, but for now I've added a replacement for the standard systemd wired network config file that OE-Core provides which adds the LLDP=no
setting.
Disabling ipv6 is another approach that seems to work, if you really want LLDP turned on.
The commit to the warrior branch doesn't do anything because the upstream systemd recipe in the openembedded-core warrior branch doesn't reference wired.network.
If I add this to the recipe
SRC_URI += "file://wired.network"
FILES_${PN} += "${systemd_unitdir}/network/80-wired.network"
do_install_append () {
install -D -m0644 ${WORKDIR}/wired.network ${D}${systemd_unitdir}/network/80-wired.network
}
Then it correctly includes the wired.network file in the rootfs, and fixes the network issue for me as well (with LLDP enabled, the ethernet link goes down in <30 seconds after it comes up for me).
Thanks @compenguy for catching that. Will post a fixup shortly.
This should be working now in warrior and master with the workarounds applied. Feel free to reopen if the issue reappears.
Thank you a lot for this comment.
I've spent several days trying to catch the issue and couldn't even guess that there can be such a problem.
Hi Everyone, I’m experiencing an issue with IPv4 networking on my meta-tegra build for Jetson TX2, warrior branch and I was wondering if anyone can provide any suggestions about the best troubleshooting steps.
I’ve found that when I build a combined mender image using this project and then boot up, I can access IPV4 network addresses for something between 1 and 10 minutes, then I lose all IPV4 connection. Any attempts to access returns “Network is unreachable” or similar error messages. Here’s a script I’ve been using to reproduce
This script fails after something between 1 to 15 minutes of running. After the failure the only reliable way to get an IPV4 connection again is to reboot. In some cases dropping and re-upping the link brings it back, and in some cases ethtool manipulation (I’ve used
ethtool -s eth0 speed 100 duplex full autoneg off
) brings it back, but not always in either case.I don’t see any helpful or obviously suspicious messages in journalctl or /var/log/messages but I did see this message in a few instances:
I’ve reproduced this behavior if I remove the mender layer and just use meta-tegra + poky, although in the default configuration I need to add this content to /etc/systemd/network/eth.network to get an IPV4 address:
And then restart networking with
systemctl restart systemd-networkd.service
Linux jetson-tx2 4.9.140-l4t-r32.1+g3c02a65d917
toLinux jetson-tx2 4.9.140-tegra
.Unless I’m confusing myself (which is entirely possible) I think this suggests there’s something in the nvidia L4T root filesystem which is required for IPv4 networking which isn’t in my core-image-base rootfs. I’m planning to look in more detail at the systemd configuration between core-image-base and the L4T root filesystem but I’m wondering if anyone else has noticed this yet or has suggestions about what I should try.