FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.31k stars 1.25k forks source link

bgp_read_packet error: Connection reset by peer #13654

Closed liuxyon closed 1 year ago

liuxyon commented 1 year ago

i using ubuntu 23.04 Linux 6.3.5-x64v2-xanmod1 x86_64 and frr 8.5.1, When the upstream opens the full route table and pushes it to me, the bgp session keeps disconnecting and reconnecting, which cannot work normally.Can you tell me how to find the cause of the problem and fix it?

2023/05/31 22:36:01 BGP: [N9HHH-F8H1M] %ADJCHANGE: neighbor 2602:fe69:206::1(Unknown) in vrf default Up 2023/05/31 22:36:10 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 2602:fe69:206::1 4/0 (Hold Timer Expired) 0 bytes 2023/05/31 22:36:10 BGP: [PXVXG-TFNNT] %ADJCHANGE: neighbor 2602:fe69:206::1(Unknown) in vrf default Down BGP Notification send 2023/05/31 22:36:13 BGP: [HZN6M-XRM1G] %NOTIFICATION: received from neighbor 2602:fe69:206::1 6/5 (Cease/Connection Rejected) 0 bytes 2023/05/31 22:36:13 BGP: [P3GYW-PBKQG][EC 33554466] 2602:fe69:206::1 [FSM] unexpected packet received in state OpenSent 2023/05/31 22:36:13 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 2602:fe69:206::1 5/1 (Neighbor Events Error/Receive Unexpected Message in OpenSent State) 0 bytes 2023/05/31 22:36:18 BGP: [N9HHH-F8H1M] %ADJCHANGE: neighbor 2602:fe69:206::1(Unknown) in vrf default Up 2023/05/31 22:36:27 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 2602:fe69:206::1 4/0 (Hold Timer Expired) 0 bytes 2023/05/31 22:36:27 BGP: [PXVXG-TFNNT] %ADJCHANGE: neighbor 2602:fe69:206::1(Unknown) in vrf default Down BGP Notification send 2023/05/31 22:36:28 BGP: [H4B4J-DCW2R][EC 33554455] 2602:fe69:206::1 [Error] bgp_read_packet error: Connection reset by peer

ton31337 commented 1 year ago

Could you add more details? Is this happening with 8.5.1 only? Please show the configuration.

liuxyon commented 1 year ago

Because all my servers are upgraded to use frr8.5.1.One of the server nodes found this problem and was interrupted when the upstream pushed full routes. Since upstream lobbying is a problem with my frr router, I'm here to report this situation.

! ! Zebra configuration saved from vty ! 2023/05/31 23:34:49 ! frr version 8.5.1 frr defaults datacenter ! hostname let03draw07 log file /etc/frr/frr.log ! ! ! router bgp 39755 no bgp fast-external-failover no bgp suppress-duplicates no bgp hard-administrative-reset no bgp default ipv4-unicast no bgp graceful-restart notification bgp bestpath as-path confed bgp bestpath med confed no bgp network import-check

neighbor 2602:fe69:206::1 remote-as 36369 neighbor 2602:fe69:206::1 description as36369 neighbor 2602:fe69:206::1 ebgp-multihop 2 neighbor 2602:fe69:206::1 disable-connected-check neighbor 2602:fe69:206::1 update-source 2602:fe69:218::1 ! address-family ipv6 unicast network 2602:fed2:7026::/48 neighbor 2602:fe69:206::1 activate neighbor 2602:fe69:206::1 remove-private-AS all neighbor 2602:fe69:206::1 soft-reconfiguration inbound neighbor 2602:fe69:206::1 prefix-list ipv6in in neighbor 2602:fe69:206::1 prefix-list myv6out out exit-address-family ! exit ! access-list 1 remark utility ACL to deny everything access-list 1 seq 5 deny any ipv6 prefix-list ipv6in seq 105 deny ::1/128 ipv6 prefix-list ipv6in seq 110 deny ::/128 ipv6 prefix-list ipv6in seq 115 deny ::/0 ipv6 prefix-list ipv6in seq 120 deny 3ffe::/16 le 128 ipv6 prefix-list ipv6in seq 130 deny 2001:db8::/32 le 128 ipv6 prefix-list ipv6in seq 140 deny 2001::/32 ipv6 prefix-list ipv6in seq 150 deny 2001::/32 le 128 ipv6 prefix-list ipv6in seq 160 permit 2002::/16 ipv6 prefix-list ipv6in seq 170 deny 2002::/16 le 128 ipv6 prefix-list ipv6in seq 180 deny ::/8 le 128 ipv6 prefix-list ipv6in seq 190 deny fe00::/9 le 128 ipv6 prefix-list ipv6in seq 200 deny ff00::/8 le 128 ipv6 prefix-list ipv6in seq 205 permit 2000::/3 le 48 ipv6 prefix-list ipv6in seq 900 deny ::/0 le 128 ipv6 prefix-list ipv6in seq 999 deny any ipv6 prefix-list mycn6out1 seq 130 deny ::1/128 ipv6 prefix-list mycn6out1 seq 140 deny ::/128 ipv6 prefix-list mycn6out1 seq 145 deny 3ffe::/16 le 128 ipv6 prefix-list mycn6out1 seq 150 deny 2001:db8::/32 le 128 ipv6 prefix-list mycn6out1 seq 151 permit 2001::/32 ipv6 prefix-list mycn6out1 seq 152 deny 2001::/32 le 128 ipv6 prefix-list mycn6out1 seq 155 permit 2002::/16 ipv6 prefix-list mycn6out1 seq 160 deny 2002::/16 le 128 ipv6 prefix-list mycn6out1 seq 165 deny ::/8 le 128 ipv6 prefix-list mycn6out1 seq 170 deny fe00::/9 le 128 ipv6 prefix-list mycn6out1 seq 175 deny ff00::/8 le 128 ipv6 prefix-list mycn6out1 seq 180 permit 2000::/3 le 48 ipv6 prefix-list mycn6out1 seq 500 deny ::/0 le 128 ipv6 prefix-list mycn6out seq 5 deny ::1/128 ipv6 prefix-list mycn6out seq 10 deny ::/128 ipv6 prefix-list mycn6out seq 15 deny 3ffe::/16 le 128 ipv6 prefix-list mycn6out seq 20 deny 2001:db8::/32 le 128 ipv6 prefix-list mycn6out seq 25 deny 2001:10::/28 le 128 ipv6 prefix-list mycn6out seq 30 deny 2001:2::/48 le 128 ipv6 prefix-list mycn6out seq 35 deny 100::/64 le 128 ipv6 prefix-list mycn6out seq 40 deny ::/8 le 128 ipv6 prefix-list mycn6out seq 45 deny fc00::/7 le 128 ipv6 prefix-list mycn6out seq 50 deny ff00::/8 le 128 ipv6 prefix-list mycn6out seq 55 deny 2002::/16 le 128 ipv6 prefix-list mycn6out seq 60 deny ::/0 ge 49 le 128 ipv6 prefix-list mycn6out seq 100 deny 2001::/32 ipv6 prefix-list mycn6out seq 110 permit 2000::/3 le 48 ipv6 prefix-list mycn6out seq 160 permit 2002::/16 ipv6 prefix-list mycn6out seq 999 deny any ipv6 prefix-list myv6out seq 66 permit 2602:fed2:7025::/48 ipv6 prefix-list myv6out seq 999 deny any ipv6 prefix-list 49-only seq 5 permit ::/0 ge 49 ipv6 prefix-list cymru-out-v6 seq 5 deny ::/0 le 128 ipv6 prefix-list v6cymru-out seq 5 deny any ip prefix-list ipv4no seq 999 deny any ip prefix-list no seq 999 deny any ipv6 prefix-list no seq 999 deny any ! bgp as-path access-list 2 seq 5 deny ^([0-9]+)(\1)+$ bgp as-path access-list 2 seq 10 permit .* bgp as-path access-list 99 seq 5 permit (4294967[0-1][0-9][0-9])|(42949672[0-8][0-9])|(429496729[0-4]) bgp as-path access-list 99 seq 10 permit (42949[0-5][0-9][0-9][0-9][0-9])|(429496[0-6][0-9][0-9][0-9]) bgp as-path access-list 99 seq 15 permit (429[0-3][0-9][0-9][0-9][0-9][0-9][0-9])|(4294[0-8][0-9][0-9][0-9][0-9][0-9]) bgp as-path access-list 99 seq 20 permit (6449[6-9])|(6450[0-9])|(6451[0-1])|(6553[6-9])|(6554[0-9])|(6555[0-1])_ bgp as-path access-list 99 seq 25 permit 0 bgp as-path access-list 99 seq 30 permit 1310[0-6][0-9]|13107[0-1] bgp as-path access-list 99 seq 35 permit 23456 bgp as-path access-list 99 seq 40 permit 42[0-8][0-9][0-9][0-9][0-9][0-9][0-9][0-9] bgp as-path access-list 99 seq 45 permit 6(4(5(1[2-9]|[2-9][0-9])|[6-9][0-9][0-9])|5([0-4][0-9][0-9]|5([0-2][0-9]|3[0-5]))) bgp as-path access-list 99 seq 50 permit 6555[2-9]|655[6-9][0-9]|65[6-9][0-9][0-9]|6[6-9][0-9][0-9][0-9] bgp as-path access-list 99 seq 55 permit [7-9][0-9][0-9][0-9][0-9]|1[0-2][0-9][0-9][0-9][0-9]|130[0-9][0-9][0-9] ! bgp community-list 100 seq 5 permit 65332:888 bgp community-list standard RTBH seq 5 permit 39755:0 ! ! route-map 01 permit 50 set local-preference 200 set metric 0 exit ! route-map A01 deny 20 match rpki invalid exit ! route-map A01 deny 25 match as-path 99 exit ! route-map A01 permit 28 match as-path 2 exit ! route-map A01 permit 30 match rpki notfound set local-preference 100 set metric 100 exit ! route-map A01 permit 50 match rpki valid set local-preference 200 set metric 0 exit ! route-map 80 permit 50 match ipv6 address prefix-list myv6out set local-preference 100 set metric 0 set community 174:970 additive exit ! route-map A02 deny 10 match as-path 99 exit ! route-map A02 deny 20 match rpki invalid exit ! route-map A02 permit 30 match rpki notfound set local-preference 100 set metric 300 set as-path prepend last-as 1 exit ! route-map A02 permit 50 match rpki valid set local-preference 100 set metric 50 exit ! ! ! ! rpki rpki polling_period 300 rpki retry_interval 60 rpki cache 134.195.120.55 3323 preference 1 rpki cache 2602:feda:ca3::face 3323 preference 2 rpki cache 210.173.170.254 323 preference 3 rpki cache 2001:3a0:e002:1001::101 323 preference 4 exit !

! ! Zebra configuration saved from vty ! 2023/05/31 23:34:49 ! frr version 8.5.1 frr defaults traditional ! hostname let03draw07 log file /etc/frr/frr.log log syslog informational ! ! ! ipv6 route 2602:fe69::/32 2602:fe69:200::1 ipv6 route 2602:fe69:206::/48 2602:fe69:200::1 ipv6 route 2602:fe69:218::/48 2602:fe69:200::1 ipv6 route 100::/64 Null0 ! !

liuxyon commented 1 year ago

frr no longer disconnects when upstream closes full route tables push

ton31337 commented 1 year ago

Hold Timer Expired means something might be broken underlay. Can you show show bgp neighbor 2602:fe69:206::1 when this happens again?

ahmdzaki18 commented 1 year ago

can you show sysctl net.ipv6.route.max_size?

liuxyon commented 1 year ago

sysctl net.ipv6.route.max_size

sysctl net.ipv6.route.max_size net.ipv6.route.max_size = 524288

liuxyon commented 1 year ago

show bgp neighbor 2602:fe69:206::1 BGP neighbor is 2602:fe69:206::1, remote AS 36369, local AS 39755, external link Local Role: undefined Remote Role: undefined Description: as36369 BGP version 4, remote router ID 104.224.52.254, local router ID 104.224.52.125 BGP state = Established, up for 00:00:07 Last read 00:00:27, Last write 00:00:01 Hold time is 9 seconds, keepalive interval is 3 seconds Configured hold time is 9 seconds, keepalive interval is 3 seconds Configured conditional advertisements interval is 60 seconds Neighbor capabilities: 4 Byte AS: advertised and received Extended Message: advertised AddPath: IPv6 Unicast: RX advertised Long-lived Graceful Restart: advertised Route refresh: advertised and received(old & new) Enhanced Route Refresh: advertised Address Family IPv6 Unicast: advertised and received Hostname Capability: advertised (name: let03draw07,domain name: n/a) not received Graceful Restart Capability: advertised Graceful restart information: Local GR Mode: Helper*

Remote GR Mode: Disable

R bit: False
N bit: False
Timers:
  Configured Restart Time(sec): 120
  Received Restart Time(sec): 0

Message statistics: Inq depth is 0 Outq depth is 0 Sent Rcvd Opens: 29 11 Notifications: 17 14 Updates: 22 0 Keepalives: 33 11 Route Refresh: 0 0 Capability: 0 0 Total: 101 36 Minimum time between advertisement runs is 0 seconds Update source is 2602:fe69:218::1

For address family: IPv6 Unicast Update group 14, subgroup 16 Packet Queue length 0 Inbound soft reconfiguration allowed Private AS numbers (all) removed in updates to this neighbor Community attribute sent to this neighbor(all) Inbound path policy configured Outbound path policy configured Incoming update prefix filter list is ipv6in Outgoing update prefix filter list is myv6out 0 accepted prefixes

Connections established 11; dropped 10 Last reset 00:00:20, No AFI/SAFI activated for peer External BGP neighbor may be up to 2 hops away. Local host: 2602:fe69:218::1, Local port: 179 Foreign host: 2602:fe69:206::1, Foreign port: 44130 Nexthop: 104.224.52.125 Nexthop global: 2602:fe69:218::1 Nexthop local: :: BGP connection: non shared network BGP Connect Retry Timer in Seconds: 10 Estimated round trip time: 1 ms Read thread: on Write thread: on FD used: 31

ton31337 commented 1 year ago

I assume this is a multihop neighbor and 9 hold time is not enough for some reasons (packet loss, etc.)?

liuxyon commented 1 year ago

I assume this is a multihop neighbor and 9 hold time is not enough for some reasons (packet loss, etc.)?

ping 2602:fe69:206::1 PING 2602:fe69:206::1(2602:fe69:206::1) 56 data bytes 64 bytes from 2602:fe69:206::1: icmp_seq=1 ttl=64 time=12.8 ms 64 bytes from 2602:fe69:206::1: icmp_seq=2 ttl=64 time=3.68 ms 64 bytes from 2602:fe69:206::1: icmp_seq=3 ttl=64 time=2.03 ms 64 bytes from 2602:fe69:206::1: icmp_seq=4 ttl=64 time=3.43 ms 64 bytes from 2602:fe69:206::1: icmp_seq=5 ttl=64 time=3.10 ms 64 bytes from 2602:fe69:206::1: icmp_seq=6 ttl=64 time=1.95 ms 64 bytes from 2602:fe69:206::1: icmp_seq=7 ttl=64 time=1.32 ms 64 bytes from 2602:fe69:206::1: icmp_seq=8 ttl=64 time=6.27 ms 64 bytes from 2602:fe69:206::1: icmp_seq=9 ttl=64 time=0.682 ms 64 bytes from 2602:fe69:206::1: icmp_seq=10 ttl=64 time=0.920 ms 64 bytes from 2602:fe69:206::1: icmp_seq=11 ttl=64 time=0.693 ms 64 bytes from 2602:fe69:206::1: icmp_seq=12 ttl=64 time=2.69 ms 64 bytes from 2602:fe69:206::1: icmp_seq=13 ttl=64 time=0.620 ms 64 bytes from 2602:fe69:206::1: icmp_seq=14 ttl=64 time=0.597 ms ^C --- 2602:fe69:206::1 ping statistics --- 14 packets transmitted, 14 received, 0% packet loss, time 13050ms rtt min/avg/max/mdev = 0.597/2.914/12.821/3.152 ms let03draw07# 2023/06/01 19:45:27 [PHJDC-499N2][EC 100663314] STARVATION: task vtysh_rl_read (564f8f6c3e30) ran for 13686ms (cpu time 1ms)

liuxyon commented 1 year ago

I assume this is a multihop neighbor and 9 hold time is not enough for some reasons (packet loss, etc.)? According to your guess, I added the following command, and I can receive the route temporarily.

timers bgp 100 300 neighbor 2602:fe69:206::1 timers connect 120

liuxyon commented 1 year ago

Another server of mine is also experiencing constant outages.

2023/06/02 19:30:35 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 2a0e:8f02:2009:100::1 3/8 (UPDATE Message Error/Invalid NEXT_HOP Attribute) 7 bytes 40 03 2023/06/02 19:30:35 BGP: [GW152-RVASS][EC 33554455] bgp_process_packet: BGP UPDATE receipt failed for peer: 2a0e:8f02:2009:100::1 2023/06/02 19:30:35 BGP: [PXVXG-TFNNT] %ADJCHANGE: neighbor 2a0e:8f02:2009:100::1(dus1-de) in vrf default Down BGP Notification send 2023/06/02 19:31:02 BGP: [N9HHH-F8H1M] %ADJCHANGE: neighbor 2a0e:8f02:2009:100::1(dus1-de) in vrf default Up 2023/06/02 19:31:44 BGP: [GTTPK-RX2GP][EC 33554436] Malformed AS path from 2a0e:8f02:2009:100::1, length is 28 2023/06/02 19:31:44 BGP: [RWQFK-BA2JR][EC 33554488] 2a0e:8f02:2009:100::1: Attribute AS_PATH, parse error - treating as withdrawal 2023/06/02 19:31:44 BGP: [QWG8G-NT6EJ][EC 33554455] 2a0e:8f02:2009:100::1(dus1-de) rcvd UPDATE with errors in attr(s)!! Withdrawing route. 2023/06/02 19:31:44 BGP: [SEH94-D8675][EC 33554438] Martian nexthop 0.0.0.0 2023/06/02 19:31:44 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 2a0e:8f02:2009:100::1 3/8 (UPDATE Message Error/Invalid NEXT_HOP Attribute) 7 bytes 40 03 2023/06/02 19:31:44 BGP: [GW152-RVASS][EC 33554455] bgp_process_packet: BGP UPDATE receipt failed for peer: 2a0e:8f02:2009:100::1 2023/06/02 19:31:44 BGP: [PXVXG-TFNNT] %ADJCHANGE: neighbor 2a0e:8f02:2009:100::1(dus1-de) in vrf default Down BGP Notification send 2023/06/02 19:32:10 BGP: [N9HHH-F8H1M] %ADJCHANGE: neighbor 2a0e:8f02:2009:100::1(dus1-de) in vrf default Up

show bgp neighbor 2a0e:8f02:2009:100::1 BGP neighbor is 2a0e:8f02:2009:100::1, remote AS 213045, local AS 39755, external link Local Role: undefined Remote Role: undefined Description: "AS213045" Hostname: dus1-de BGP version 4, remote router ID 10.0.0.8, local router ID 91.228.55.165 BGP state = Established, up for 00:00:27 Last read 00:00:00, Last write 00:00:25 Hold time is 180 seconds, keepalive interval is 60 seconds Configured hold time is 300 seconds, keepalive interval is 100 seconds Configured conditional advertisements interval is 60 seconds Neighbor capabilities: 4 Byte AS: advertised and received Extended Message: advertised and received AddPath: IPv6 Unicast: RX advertised and received Dynamic: advertised Long-lived Graceful Restart: advertised and received Address families by peer: Route refresh: advertised and received(old & new) Enhanced Route Refresh: advertised and received Address Family IPv6 Unicast: advertised and received Hostname Capability: advertised (name: vps3630.first-root.com,domain name: n/a) received (name: dus1-de,domain name: n/a) Graceful Restart Capability: advertised and received Remote Restart timer is 120 seconds Address families by peer: none Graceful restart information: End-of-RIB send: IPv6 Unicast End-of-RIB received: Local GR Mode: Helper*

Remote GR Mode: Helper

R bit: False
N bit: True
Timers:
  Configured Restart Time(sec): 120
  Received Restart Time(sec): 120
IPv6 Unicast:
  F bit: False
  End-of-RIB sent: Yes
  End-of-RIB sent after update: Yes
  End-of-RIB received: No
  Timers:
    Configured Stale Path Time(sec): 360

Message statistics: Inq depth is 10000 Outq depth is 0 Sent Rcvd Opens: 56 56 Notifications: 55 0 Updates: 227 2473120 Keepalives: 58 56 Route Refresh: 1 0 Capability: 0 0 Total: 397 2473232 Minimum time between advertisement runs is 0 seconds Update source is AS213045

For address family: IPv6 Unicast Update group 64, subgroup 63 Packet Queue length 0 Private AS numbers (all) removed in updates to this neighbor NEXT_HOP is always this router Community attribute sent to this neighbor(all) Inbound path policy configured Outbound path policy configured Incoming update prefix filter list is ipv6in Outgoing update prefix filter list is myv6out Route map for incoming advertisements is A01 Route map for outgoing advertisements is 213045 105513 accepted prefixes

Connections established 56; dropped 55 Last reset 00:00:51, Notification sent (UPDATE Message Error/Invalid NEXT_HOP Attribute) Message received that caused BGP to send a NOTIFICATION: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF 00A50200 00008E90 0E002A00 0201202A 0E8F0220 09010000 00000000 000001FE 80000000 00000000 00000005 69061500 1D2A0ABC C0400101 00500200 1C020600 03403500 00886F00 00E3BB00 000A2B00 00067500 00A02901 00C00708 0000A029 C10B5807 C0080888 6F006E88 6F0070C0 20180003 40350000 00000000 00650003 40350000 00050000 0065C010 0800020A 2B19E2E4 62 External BGP neighbor may be up to 1 hops away. Local host: 2a0e:8f02:2009:100::2, Local port: 40613 Foreign host: 2a0e:8f02:2009:100::1, Foreign port: 179 Nexthop: 91.228.55.165 Nexthop global: 2a0e:8f02:2009:100::2 Nexthop local: fe80::5be4:35a5 BGP connection: shared network BGP Connect Retry Timer in Seconds: 120 Estimated round trip time: 0 ms Read thread: on Write thread: on FD used: 30