aparcar / openwrt

Staging tree of Paul Spooren
Other
8 stars 1 forks source link

FS#494 - NETDEV WATCHDOG: ptm0 (): transmit queue 0 timed out #516

Open aparcar opened 7 years ago

aparcar commented 7 years ago

dziny:

I have a VDSL line with Plusnet (UK) - the connection is pppoe - ptm0.101. With the supplied modem/router the line and connection is stable with no disconnects. With LEDE the connection is established and works well until it disconnects (sometimes as early as few minutes other times it stays connected up to an hour). After the disconnect there is no reconnection until a reboot. Restarting wan interface (ifdown wan/ifup wan) or dsl connection (/etc/init.d/dslcontrol stop/start) does not help.

Supply the following if possible:

Here is a trace of the crash (dmesg):

[ 1414.124413] ---[ beginning trace ff034b465cdad16b ]--- [ 1414.125631] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1a8/0x2f0() [ 1414.126471] NETDEV WATCHDOG: ptm0 (): transmit queue 0 timed out [ 1414.132456] Modules linked in: ltq_ptm_vr9 option iptable_nat ath9k usb_wwan rt2800usb rt2800lib pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 l2tp_ppp ipt_REJECT ipt_MASQUERADE ath9k_common xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_policy xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial rt2x00usb rt2x00lib pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9 iptable_raw iptable_mangle iptable_filter ipt_ah ipt_ECN ip_tables crc_itu_t crc_ccitt cdc_acm ath9k_hw ath10k_pci ath10k_core ath mac80211 cfg80211 compat drv_dsl_cpe_api drv_mei_cpe xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc l2tp_ip6 l2tp_ip l2tp_eth sit l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 tunnel4 ip_tunnel tun af_key xfrm_user xfrm_ipcomp xfrm_algo br2684 atm drv_ifxos echainiv sha256_generic sha1_generic jitterentropy_rng drbg md5 hmac des_generic cbc authenc usb_storage dwc2 uhci_hcd ehci_platform ehci_hcd sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache aead crypto_null [ 1414.287462] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.7 #1 [ 1414.293130] Stack : 804b0000 00000001 00000000 00000000 805172b8 80516f43 80489a24 00000000 [ 1414.293130] 80673844 00010000 80510000 805159bc 80515abc 80055664 00000003 80510000 [ 1414.293130] 80491b4c 00000000 8048ff50 80511c44 80515abc 800535b0 00000006 00000001 [ 1414.293130] 00000000 80512000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1414.293130] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1414.293130] ... [ 1414.328618] Call Trace: [ 1414.331095] [<800178a8>] show_stack+0x50/0x84 [ 1414.335454] [<8002af48>] warn_slowpath_common+0xa0/0xd0 [ 1414.340670] [<8002afa4>] warn_slowpath_fmt+0x2c/0x38 [ 1414.345636] [<802e637c>] dev_watchdog+0x1a8/0x2f0 [ 1414.350348] [<8005f7b0>] call_timer_fn.isra.5+0x24/0x80 [ 1414.355557] [<8005fa2c>] run_timer_softirq+0x1a4/0x208 [ 1414.360694] [<8002de80>] __do_softirq+0x298/0x2b0 [ 1414.365388] [<80002430>] ret_from_irq+0x0/0x4 [ 1414.369760] [<80013a8c>] r4k_wait_irqoff+0x18/0x20 [ 1414.374528] [<8004ff6c>] cpu_startup_entry+0xa4/0xf8 [ 1414.379508] [<80539bf8>] start_kernel+0x474/0x494 [ 1414.384180] [ 1414.385631] ---[ end trace ff034b465cdad16b ]---

aparcar commented 7 years ago

mkresin:

I can confirm the issue. I'm seeing the same using vdsl + ptm + vlan but never heard of someone else having these problems. I was the opinion it is related to my local (development) changes.

Sometimes it does work for weeks, sometimes it does work only for days. Never found a way to trigger the warning/crash.

And I can confirm that only after a reboot the pppoe discover works again. Simply unloading the ptm kernel module or similar does not work.

After the warning is shown and the pppoe discovery doesn't work any longer, things like querying the ptm carrier state fail as well:

root@LEDE:~# cat /sys/devices/virtual/net/ptm0/carrier cat: read error: Invalid argument

If tried different xdsl firmware version to make sure that it's not related to a crash of the xdsl firmware.

aparcar commented 7 years ago

hailfinger:

With o2 ADSL (Annex B) I'm seeing this usually within 4 minutes after boot. I don't even get an initial PPPoE connection established before everything falls over.

syslog is attached. Output from dsl_control follows. root@LEDE:~# /etc/init.d/dsl_control status ATU-C Vendor ID: Broadcom 147.158 ATU-C System Vendor ID: 00,00,30,30,30,30,00,00 Chipset: Lantiq-VRX200 Unknown Firmware Version: 5.7.4.4.0.2 API Version: 4.17.18.6 XTSE Capabilities: 0x0, 0x0, 0x0, 0x0, 0x0, 0x4, 0x0, 0x0 Annex: B Line Mode: G.992.5 (ADSL2+) Profile:
Line State: UP [0x801: showtime_tc_sync] Forward Error Correction Seconds (FECS): Near: 0 / Far: 178461 Errored seconds (ES): Near: 0 / Far: 7808 Severely Errored Seconds (SES): Near: 0 / Far: 1913 Loss of Signal Seconds (LOSS): Near: 0 / Far: 8 Unavailable Seconds (UAS): Near: 48 / Far: 48 Header Error Code Errors (HEC): Near: 0 / Far: 608262 Non Pre-emtive CRC errors (CRC_P): Near: 0 / Far: 0 Pre-emtive CRC errors (CRCP_P): Near: 0 / Far: 0 Power Management Mode: L0 - Synchronized Latency / Interleave Delay: Down: Interleave (8.0 ms) / Up: Interleave (8.0 ms) Data Rate: Down: 10.988 Mb/s / Up: 1.150 Mb/s Line Attenuation (LATN): Down: 20.8dB / Up: 7.8dB Signal Attenuation (SATN): Down: 19.0dB / Up: 8.0dB Noise Margin (SNR): Down: 9.2dB / Up: 9.4dB Aggregate Transmit Power (ACTATP): Down: 18.6dB / Up: 12.6dB Max. Attainable Data Rate (ATTNDR): Down: 11.104 Mb/s / Up: 1.234 Mb/s Line Uptime Seconds: 625 Line Uptime: 10m 25s

aparcar commented 7 years ago

hailfinger:

OK, this is interesting. Apparently it only happens if no data is sent/received over the line for some time. I had incorrectly configured the DSL Encapsulation mode and didn't get any responses from the remote side due to that. With the correct DSL Encapsulation mode the remote side does respond, and the transmit queue timeout doesn't happen anymore.

aparcar commented 7 years ago

LipkeGu:

config atm-bridge 'atm' option vpi '1' option vci '32' option encaps 'llc' option payload 'bridged'

config dsl 'dsl' option annex 'bdmt' option xfer_mode 'atm' option line_mode 'adsl'

config interface 'wan' option proto 'pppoe' option ipv6 'auto' option username 'username' option password 'password' option ifname 'nas0' this are the settings what im using and it works fine :)

aparcar commented 7 years ago

dziny:

Guido, your comment is not relevant unfortunately as from your config I see you are using ADSL over ATM not VDSL over PTM.

aparcar commented 6 years ago

ali1234:

I am seeing this on Home Hub 5A too. It crashes about once every three days for me.

I also have an Arcadyan VG3503J - the BT Openreach Modem with the same Lantiq chipset. If I configure it as a full bridge modem and run PPPoE over the WAN ethernet on the Home Hub then the crash doesn't seem to happen, despite all the same drivers being involved.