aparcar / openwrt

Staging tree of Paul Spooren
Other
8 stars 1 forks source link

FS#1136 - HFSC kernel warnings with QoS / SQM #1232

Closed aparcar closed 6 years ago

aparcar commented 7 years ago

codemarauder:

Warning is shown in kernel and traffic by-passes the limit set in QOS/SQM resulting in a peak and then falls much below the available bandwidth. It again tries to stabilise until next crash. This results in peaks beyond 8Mbps and goes down to 2Mbps after crash when limits configured are @90% of 6Mbps. Users complain of slow Internet access.

[79569.780046] ------------[ cut here ]------------ [9/1922] [79569.784788] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1426 0xffffffffa082f7ad() [79569.792697] Modules linked in: ifb qcserial option ipw iptable_nat ip6table_nat cdc_mbim usb_wwan usb_serial_simple ti_usb_3410_5052 sr9700 smsc95xx sierra _net sierra rndis_host qmi_wwan pppoe ppp_async pl2303 oti6858 nft_chain_nat_ipv6 nft_chain_nat_ipv4 nf_tables_inet nf_nat_pptp nf_nat_ipv6 nf_nat_ipv4 nf_nat _amanda nf_conntrack_pptp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack_amanda mos7720 mct_u232 mcs7830 keyspan kalmia ipt_REJECT ipt_MASQUERADE huawei_cdc _ncm garmin_gps ftdi_sio ebtable_nat ebtable_filter ebtable_broute dm9601 cypress_m8 cp210x ch341 cdc_subset cdc_ncm cdc_ether cdc_eem belkin_sa ax88179_178a asix ark3116 xt_u32 xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_socket xt_recent xt_quota2 xt_quota xt_psd xt_pkttype xt_physdev xt_owner x t_nat xt_multiport xt_mark [79569.865507] xt_mac xt_lscan xt_limit xt_length2 xt_length xt_ipv4options xt_iprange xt_ipp2p xt_iface xt_hl xt_helper xt_hashlimit xt_geoip xt_fuzzy xt_es p xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_condition xt_comment xt_cluster xt_addrtype xt_TRACE xt_TPROXY xt_TEE xt_TCPMSS xt_TARP IT xt_SYSRQ xt_REDIRECT xt_NFQUEUE xt_NFLOG xt_NETMAP xt_LUA xt_LOGMARK xt_LOG xt_LED xt_IPMARK xt_HL xt_DSCP xt_DNETMAP xt_DHCPMAC xt_DELUDE xt_CT xt_CLASSIF Y xt_CHAOS xt_ACCOUNT w83793 visor vhci_hcd usbserial usbnet usbmon usbip_host usbip_core ts_kmp ts_fsm ts_bm tmp103 tmp102 sht21 rtl8150 r8169 r8152 pppox pp p_generic pegasus nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir_ipv6 nft_redir_ipv4 nft_redir nft_rbtree nft_nat nft_meta nft_masq_ipv6 nft_masq_ipv4 nft_masq [79569.936919] nft_log nft_limit nft_hash nft_exthdr nft_ct nft_counter nft_chain_route_ipv6 nft_chain_route_ipv4 nfnetlink_queue nfnetlink_log nf_tables_ipv 6 nf_tables_ipv4 nf_tables nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nfnat irc nf_nat_h323 nf_nat_ftp nf_log_ipv4 nf_dup_ipv6 nf_dup_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack _rtsp nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast macvlan ltc4 151 lm95241 lm92 lm90 lm85 lm77 lm75 lm63 kaweth iptable_raw iptable_mangle iptable_filter ipt_ah ipt_ECN ipt_CLUSTERIP ipheth ip6table_raw ip_tables ina2xx i na209 hso ezusb ebtables [79570.007956] ebt_vlan ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_nflog ebt_mark_m ebt_mark ebt_log ebt_limit ebt_ip6 ebt_ip ebt_dnat ebt_arpreply ebt_ar p ebt_among ebt_802_3 e1000e e100 crc_ccitt compat_xtables cdc_wdm cdc_acm br_netfilter arptable_filter arpt_mangle arp_tables adt7475 sch_cake em_cmp sch_teq l em_nbyte sch_dsmark sch_pie act_ipt sch_gred cls_basic sch_prio em_text sch_codel sch_fq sch_sfq em_meta act_police sch_red act_connmark act_skbedit act_mir red em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress sg tmp421 k10temp gpio_fan adt7410 adt7x10 i2c_tiny_usb i2c_piix 4 i2c_gpio i2c_algo_pcf i2c_algo_pca i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux i2c_dev ledtrig_usbport trelay ledtrig_oneshot ledtrig_heartbeat led trig_gpio hwmon xt_set [79570.079289] ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash _ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink sr_mod cdrom ip6t_NPT ip6t_MASQUERADE nf_nat_masquerade_ipv6 nf_nat nf_conntrack ip6t_rt ip6t_frag ip6t_hbh ip6t_eui64 ip6t_mh ip6t_ah ip6t_ipv6header ip6t_RE JECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables hwmon_vid msdos bonding ip6_gre ip_gre gre igb i2c_algo_bit e1000 sit sctp libcrc32c ip6_tunnel tunnel6 tunnel4 ip_tunnel veth tun slip slhc mpls_gso mpls_iptunnel mpls_router vfat fat cifs nls_utf8 nls_iso8859_1 nls_c p437 regmap_spi regmap_mmio [79570.150751] regmap_i2c regmap_core vxlan udp_tunnel ip6_udp_tunnel sha256_ssse3 sha256_generic md5 md4 hmac ecb des_generic leds_pca963x i2c_core xhci_pla t_hcd ledtrig_transient button_hotplug ptp pps_core mii libphy zram zsmalloc lzo_decompress lzo_compress lz4_decompress lz4_compress [last unloaded: ifb] [79570.177398] CPU: 1 PID: 11 Comm: ksoftirqd/1 Tainted: G W 4.4.92 #0 [79570.184812] Hardware name: PC Engines APU, BIOS SageBios_PCEngines_APU-45 04/05/2014 [79570.192578] 0000000000000000 ffffffff812094a2 0000000000000000 ffffffffa0830a77 [79570.200096] ffffffff81071bc3 ffff88003cfa8948 00000121854b6980 ffff88003cfa8800 [79570.207609] ffff88003cfa8c90 0000000000000001 ffffffffa082f7ad ffff88003cfa8800 [79570.215125] Call Trace: [79570.217600] [] ? dump_stack+0x5c/0x7a [79570.222933] [] ? warn_slowpath_common+0x73/0xa0 [79570.229125] [] ? 0xffffffffa082f7ad [79570.234285] [] ? __qdisc_run+0x60/0x190 [79570.239789] [] ? dev_queue_xmit+0x266/0x4c0 [79570.245808] [] ? 0xffffffffa0974606 [79570.250965] [] ? tcf_action_exec+0x40/0x70 [79570.256729] [] ? 0xffffffffa092b9af [79570.261888] [] ? load_balance+0x150/0x7e0 [79570.267560] [] ? get_rps_cpu+0x12e/0x300 [79570.273149] [] ? tc_classify+0x5b/0x120 [79570.278654] [] ? netif_receive_skb_core+0x486/0x870 [79570.285372] [] ? process_backlog+0x96/0x130 [79570.291221] [] ? net_rx_action+0x19b/0x280 [79570.296985] [] ? __do_softirq+0xc6/0x1d0 [79570.302574] [] ? run_ksoftirqd+0x20/0x40 [79570.308165] [] ? smpboot_thread_fn+0xf4/0x150 [79570.314187] [] ? sort_range+0x20/0x20 [79570.319518] [] ? kthread+0xb8/0xd0 [79570.324587] [] ? kthread_worker_fn+0x170/0x170 [79570.330700] [] ? ret_from_fork+0x3f/0x70 [79570.336286] [] ? kthread_worker_fn+0x170/0x170 [79570.342406] ---[ end trace b5df2c2f1911307f ]---

aparcar commented 7 years ago

moeller0:

Could you post the output of "cat /etc/config/sqm", "tc -d qdisc" and "tc -s qdisc" please? While sqm-scripts loads the hfsc module it typically does not use it; so far it seemed that these kernel traces, while annoying did not affect other qdiscs. It would be great to get to the bottom of this.

aparcar commented 7 years ago

diizzyy:

From what Google tells me it says that HFSC is broken upstream (Linux kernel)...

aparcar commented 7 years ago

nbd:

Please keep reporting this bug upstream to netdev@vger.kernel.org

aparcar commented 6 years ago

guidosarducci:

For reference, the upstream issue was being tracked as: [[https://bugzilla.kernel.org/show_bug.cgi?id=109581|Kernel.org Bugzilla – Bug 109581]], and as of a few months ago was marked as resolved with the patch: [[https://www.spinics.net/lists/netdev/msg450655.html|[PATCH RFC] net_sched/codel: do not defer queue length update]].

Would be great to have this backported to LEDE/OpenWRT for testing.

aparcar commented 6 years ago

nbd:

Pushed a backport to my staging tree at https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=summary Please test.

aparcar commented 6 years ago

guidosarducci:

I actually couldn't wait to try, and managed to do an A/B comparison for stable LEDE 17.01.04 with/without the patch. I tested hfsc with fq-codel, codel, and cake, each under 1-hour of network load using [[https://github.com/richb-hanover/CeroWrtScripts/blob/master/netperfrunner.sh|netperfrunner.sh]] from [[https://www.bufferbloat.net/projects/bloat/wiki/Tests_for_Bufferbloat/|bufferbloat.net]].

====Results==== System & Kernel (ar71xx) hfsc+fq_codel hfsc+codel hfsc+cake
LEDE 17.01.4 (stock) WARNING < 1min No warning No warning
LEDE 17.01.4 (hfsc_fix) No Warning No Warning No Warning

====Notes====

I won't be able to test this for master branch, but 4.9 and 4.14 are closer to the patch's original target so I expect similar results. I will continue to use my patched system as "daily driver" and report if there are problems.

If we can get some more test runtime and architecture diversity, hopefully this can be back-ported to LEDE stable.

aparcar commented 6 years ago

nbd:

Fix merged to master and lede-17.01. Thanks for testing!

aparcar commented 6 years ago

guidosarducci:

Additional testing update: I've left the combination of hfsc + fq_codel running for the last week without a kernel WARNING. Happy days!