FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.19k stars 1.23k forks source link

BGP Open Receipt Failed #16377

Closed ahmdzaki18 closed 1 month ago

ahmdzaki18 commented 1 month ago

Description

Hi, Im using MPLS L2VPN BD-MODE Between Huawei CloudEngine 6800-EI and Huawei ATN 910D-A. all good if no L2VPN. But i have some issue if i try to established two nodes with VSI Link. All ICMP full MTU 1500 are replies. Try transfer files via rsync also ok. But the one side bgp node stuck on connect and another side stuck on OpenSent.

From packet capture and log, if im not wrong, node-2 never get bgp message from node-1. node-1: 10.123.123.1 node-2: 10.123.123.254

and another some weird case, its sucessfully established to Mikrotik and Huawei. issue is only happen if FRR establishing to FRR, Cisco and Juniper.

i know its the L2VPN config causing this issue, but i need to know from FRR side for further troubleshoot.

Version

9.0.1 and 10.0.1

How to reproduce

BGP Peer FRR to FRR / Cisco / Juniper via L2VPN BD-MODE CloudEngine 6860 to ATN 910D-A

Expected behavior

Peer Established

Actual behavior

Stukc on OpenSent and/or Connect node-2-log.txt node-1-log.txt pcap.zip

Additional context

No response

Checklist

ahmdzaki18 commented 1 month ago

VSI Configuration if will help.

==CloudEngine==

vsi 950 bd-mode pwsignal ldp vsi-id 950 peer 172.16.121.0

bridge-domain 950 l2 binding vsi 950 statistics enable

interface 25GE1/0/22.950 mode l2 encapsulation dot1q vid 950 rewrite no-action bridge-domain 950 statistics enable

==ATN910D-A==

vsi 950 bd-mode pwsignal ldp vsi-id 950 peer 172.16.124.0

bridge-domain 950 statistic enable l2 binding vsi 950

interface Eth-Trunk100.950 mode l2 encapsulation dot1q vid 950 bridge-domain 950

ton31337 commented 1 month ago

Could you provide the configuration of the FRR instance? Don't you have neighbor X extended-optional-parameters enabled?

10.123.123.1 is sending BGP OPEN message with optional extended parameters supports, and FRR handles that OPEN message as being with extended parameters support-aware. Pease provides the configurations of all sides.

ahmdzaki18 commented 1 month ago

No, no special parameter, only basic commands.

Sure, both node same configuration: Node-1: router bgp 150552 neighbor VSI-TEST peer-group neighbor VSI-TEST remote-as 141626 neighbor 10.123.123.2 peer-group VSI-TEST neighbor 10.123.123.2 description VSI-TEST address-family ipv4 unicast neighbor VSI-TEST soft-reconfiguration inbound neighbor VSI-TEST route-map GLOBAL-DENY in neighbor VSI-TEST route-map GLOBAL-DENY out

Node-2 router bgp 141626 neighbor VSI-TEST peer-group neighbor VSI-TEST remote-as 150552 neighbor 10.123.123.1 peer-group VSI-TEST neighbor 10.123.123.1 description VSI-TEST address-family ipv4 unicast neighbor VSI-TEST soft-reconfiguration inbound neighbor VSI-TEST route-map GLOBAL-DENY in neighbor VSI-TEST route-map GLOBAL-DENY out

ton31337 commented 1 month ago

What is 10.123.123.1 in your case? What device/software?

ahmdzaki18 commented 1 month ago

What is 10.123.123.1 in your case? What device/software?

Both side using Linux Debian 12 latest with FRR 10

cbr-six-rtr# sh ver FRRouting 10.0 (cbr-six-rtr) on Linux(6.1.0-18-amd64).

ton31337 commented 1 month ago

Something strange. Could you just in case try disabling capabilities? neighbor X dont-capability-negotiate. Also try (case 2) with neighbor X extended-optional-parameters.

ahmdzaki18 commented 1 month ago

Tried that, still no luck. But here some updates.

Issue is on CloudEngine somehow not forwarding the BGP Message, but ok on 3-Way Handshake, TCP, UDP. Also problem with OSPF Hello Packet. Node-1 - CloudEngine - ATN - Node-2 Node-1 always receiving BGP Message, but Node-2 is not.

Replaced two mpls node with same CloudEngine, Node-1 - CloudEngine - CloudEngine - Node-2 with basic mpls l2vpn configuration from documentation, Now Node-1 and Node-2 not receiving any BGP Message at all and both stuck on Connect State.

Seems its bug on latest firmware CE6860-48S8CQ-EI and have no support again. Will close this ticket soon.

ton31337 commented 1 month ago

The main idea is that the packet is somehow corrupted (not a valid one).