NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.23k stars 14.22k forks source link

Raspberry PI 3: occasional ethernet hangs #94763

Open benley opened 4 years ago

benley commented 4 years ago

Describe the bug About once per month, my raspberry pi 3 running kernel 5.4.50 falls off the network when its ethernet driver seemingly hangs.

To Reproduce Steps to reproduce the behavior:

  1. Run 5.4.50 on a raspberry pi 3 for a month, maybe two months, using wired ethernet networking
  2. Wait for it to fail
  3. look at the call trace
kernel: ------------[ cut here ]------------
kernel: NETDEV WATCHDOG: eth0 (smsc95xx): transmit queue 0 timed out
kernel: WARNING: CPU: 0 PID: 5933 at net/sched/sch_generic.c:447 dev_watchdog+0x384/0x390
kernel: Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter overlay 8021q garp mrp bcm2835_v4l2(C) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_bcm2835(C) raspberrypi_cpufreq vc4 brcmfmac cec drm_kms_helper brcmutil hci_uart btbcm drm cfg80211 bluetooth smsc95xx cp210x usbserial usbnet uas raspberrypi_hwmon clk_raspberrypi ecdh_generic pwm_bcm2835 rfkill bcm2835_thermal vchiq(C) i2c_bcm2835 ecc bcm2835_rng bcm2835_dma rng_core crct10dif_ce uio_pdrv_genirq uio xt_comment ip6table_nat iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel tap macvlan bridge stp llc ip_tables x_tables ipv6 nf_defrag_ipv6 dm_mod
kernel: CPU: 0 PID: 5933 Comm: python3 Tainted: G        WC        5.4.50 #1-NixOS
kernel: Hardware name: Raspberry Pi 3 Model B (DT)
kernel: pstate: 40000005 (nZcv daif -PAN -UAO)
kernel: pc : dev_watchdog+0x384/0x390
kernel: lr : dev_watchdog+0x384/0x390
kernel: sp : ffff800010003d50
kernel: x29: ffff800010003d50 x28: 0000000000000140 
kernel: x27: 00000000ffffffff x26: ffff8000119e3018 
kernel: x25: 0000000000000000 x24: 0000000000000000 
kernel: x23: 0000000000000001 x22: ffff000033155000 
kernel: x21: ffff000033155478 x20: ffff800012017000 
kernel: x19: 0000000000000000 x18: 0000000000000000 
kernel: x17: 0000000000000000 x16: 0000000000000000 
kernel: x15: ffff0000323c6800 x14: ffffffffffffffff 
kernel: x13: 000000000001c7c0 x12: ffff8000122fd000 
kernel: x11: ffff800012042000 x10: 0000000000000000 
kernel: x9 : 0000000000000004 x8 : 00000000000004e0 
kernel: x7 : 0000000000000001 x6 : 0000000000000001 
kernel: x5 : ffff000038380288 x4 : 0000000000000001 
kernel: x3 : ffff000038380288 x2 : 0000000000000007 
kernel: x1 : 5d48ea4df7d3e800 x0 : 0000000000000000 
kernel: Call trace:
kernel:  dev_watchdog+0x384/0x390
kernel:  call_timer_fn+0x3c/0x178
kernel:  __run_timers.part.0+0x29c/0x348
kernel:  run_timer_softirq+0x40/0x78
kernel:  __do_softirq+0x138/0x334
kernel:  irq_exit+0xc0/0xe0
kernel:  __handle_domain_irq+0x70/0xc0
kernel:  bcm2836_arm_irqchip_handle_irq+0x74/0xd8
kernel:  el0_irq_naked+0x4c/0x54
kernel: ---[ end trace 2847cdab3c558078 ]---

Metadata Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"aarch64-linux"`
 - host os: `Linux 5.4.54, NixOS, 20.03.2685.977000f149b (Markhor)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.6`
 - channels(root): `"nixos-20.03.2685.977000f149b"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
alexbakker commented 4 years ago

Encountering this as well since the last couple of weeks. I'll try upgrading to 20.09 beta and see if the issue still appears there. Here's another trace:

[42543.359082] ------------[ cut here ]------------
[42543.359139] NETDEV WATCHDOG: eth0 (lan78xx): transmit queue 0 timed out
[42543.359239] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x384/0x390
[42543.359245] Modules linked in: wireguard(E) ip6_udp_tunnel udp_tunnel bcm2835_v4l2(C) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc raspberrypi_cpufreq snd_bcm2835(C) vc4 btsdio cec drm_kms_helper brcmf>
[42543.359353] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C  E     5.4.69 #1-NixOS
[42543.359355] Hardware name: Raspberry Pi 3 Model B+ (DT)
[42543.359359] pstate: 20000005 (nzCv daif -PAN -UAO)
[42543.359363] pc : dev_watchdog+0x384/0x390
[42543.359366] lr : dev_watchdog+0x384/0x390
[42543.359368] sp : ffff800010003d50
[42543.359370] x29: ffff800010003d50 x28: 0000000000000140
[42543.359374] x27: 00000000ffffffff x26: ffff8000119dd018
[42543.359377] x25: 0000000000000000 x24: 0000000000000000
[42543.359381] x23: 0000000000000001 x22: ffff0000339e5000
[42543.359387] x21: ffff0000339e5478 x20: ffff800012017000
[42543.359390] x19: 0000000000000000 x18: 0000000000000000
[42543.359393] x17: 0000000000000000 x16: 3f3122b70a080101
[42543.359397] x15: ffff800012024480 x14: ffffffffffffffff
[42543.359400] x13: 0000000000000000 x12: ffff8000122fe000
[42543.359404] x11: ffff800012042000 x10: 0000000000000000
[42543.359407] x9 : 0000000000000004 x8 : 0000000000000150
[42543.359410] x7 : 0000000000000001 x6 : 0000000000000030
[42543.359413] x5 : ffff800010003ab0 x4 : 0000000000000001
[42543.359416] x3 : ffff800010ff57d0 x2 : 0000000000000180
[42543.359420] x1 : 991d43081b536000 x0 : 0000000000000000
[42543.359423] Call trace:
[42543.359429]  dev_watchdog+0x384/0x390
[42543.359445]  call_timer_fn+0x3c/0x178
[42543.359449]  __run_timers.part.0+0x29c/0x348
[42543.359452]  run_timer_softirq+0x40/0x78
[42543.359457]  __do_softirq+0x138/0x35c
[42543.359462]  irq_exit+0xc0/0xe0
[42543.359466]  __handle_domain_irq+0x70/0xc0
[42543.359469]  bcm2836_arm_irqchip_handle_irq+0x74/0xe8
[42543.359472]  el1_irq+0xb8/0x140
[42543.359477]  arch_cpu_idle+0x3c/0x1c8
[42543.359483]  default_idle_call+0x20/0x5c
[42543.359488]  do_idle+0x208/0x288
[42543.359492]  cpu_startup_entry+0x28/0xb0
[42543.359497]  rest_init+0xc4/0xd0
[42543.359503]  arch_call_rest_init+0x14/0x1c
[42543.359506]  start_kernel+0x470/0x4a4
[42543.359509] ---[ end trace a505bdf1a274f272 ]---

System info:

 - system: `"aarch64-linux"`
 - host os: `Linux 5.4.69, NixOS, 20.03.3101.0d0660fde3b (Markhor)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.6`
 - channels(root): `"nixos-20.03.3101.0d0660fde3b"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
gebner commented 4 years ago

I have the following in my configuration, not sure if this is a similar issue:

  systemd.services.ethKernelPanicFix = rec {
    wantedBy = [ "networking.target" ];
    serviceConfig = {
      Type = "oneshot";
      RemainAfterExit = "yes";
    };
    # https://github.com/raspberrypi/linux/issues/2449
    script = ''
      ${pkgs.ethtool}/bin/ethtool -K eth0 tx-tcp-segmentation off tx-tcp6-segmentation off
    '';
  };

This might be a relevant upstream bug: https://github.com/raspberrypi/linux/issues/3401

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info