allenporter / k8s-gitops

Flux/Gitops managed k8s cluster
33 stars 1 forks source link

Proxmox guests dropping packets #311

Closed allenporter closed 3 years ago

allenporter commented 3 years ago

The default ceph alerts identified that many of the proxmox hosts are dropping packets.

# ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether ae:a8:a2:97:c4:d0 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    39849569479 142162505 0       890437  0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    5354437176 14931699 0       0       0       0      
allenporter commented 3 years ago

Following advice in https://jvns.ca/blog/2017/09/05/finding-out-where-packets-are-being-dropped/ taking a look at building dropwatch from https://github.com/nhorman/dropwatch

$ sudo ./dropwatch -l kas
Initializing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
46 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
47 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
1 drops at __udp4_lib_rcv+aef (0xffffffff8c59c64f) [software]
1 drops at __netif_receive_skb_core+14f (0xffffffff8c4e6f3f) [software]
11 drops at netlink_broadcast_filtered+257 (0xffffffff8c551937) [software]
57 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
57 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
2 drops at __udp4_lib_rcv+aef (0xffffffff8c59c64f) [software]
2 drops at skb_release_data+b4 (0xffffffff8c4cea44) [software]
63 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
62 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
1 drops at __netif_receive_skb_core+14f (0xffffffff8c4e6f3f) [software]
1 drops at __udp4_lib_rcv+aef (0xffffffff8c59c64f) [software]
53 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
51 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
2 drops at sk_stream_kill_queues+55 (0xffffffff8c4d5635) [software]
1 drops at sk_stream_kill_queues+55 (0xffffffff8c4d5635) [software]
55 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
56 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
1 drops at __netif_receive_skb_core+14f (0xffffffff8c4e6f3f) [software]
53 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
49 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
1 drops at __udp4_lib_rcv+aef (0xffffffff8c59c64f) [software]
51 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
53 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
1 drops at __netif_receive_skb_core+14f (0xffffffff8c4e6f3f) [software]
55 drops at ip_rcv_finish_core.isra.0+1b2 (0xffffffff8c5601e2) [software]
53 drops at ip6_mc_input+1ed (0xffffffff8c5efa5d) [software]
1 drops at __udp4_lib_rcv+aef (0xffffffff8c59c64f) [software]
1 drops at __netif_receive_skb_core+14f (0xffffffff8c4e6f3f) [software]

Looking at ip_rcv_finish_core - https://github.com/torvalds/linux/blob/master/net/ipv4/ip_input.c#L315 -- there are 10 places in that function were drops can happen.

allenporter commented 3 years ago

The drops appear to happen once per second.

$ watch --difference --interval 0.5 "ifconfig eth0 | grep drop"

When running tcpdump, the drops stop! It appears though, that there is a Spanning Tree Protocol packet once per second that corresponds roughly with the drop:

14:17:22.493978 STP 802.1s, Rapid STP, CIST Flags [Proposal, Learn, Forward], length 102

The symptoms sound similar to this: https://forum.proxmox.com/threads/vm-multicast-vrrp-packets-drop.57407/