Exa-Networks / exabgp

The BGP swiss army knife of networking
Other
2.05k stars 440 forks source link

Is there an option to reduce the time the exabgp will re-try the connection establishment after the peering goes down (default seems to be 60s) #1176

Closed ijukic2003 closed 4 weeks ago

ijukic2003 commented 10 months ago

Describe the bug

Is there an option to reduce the time the exabgp will re-try the connection establishment after the peering goes down (default seems to be 60s). The exabgp establishes the BGP peering, and if that connection goes down (like the peer not reachable anymore), the exabgp will re-try to establish the new connection only once in 60s. This is not enough when using exabgp to simulate the huge number of peers connecting at the same time.

thomas-mangin commented 10 months ago

Does this patch makes things work better for you?

diff --git a/src/exabgp/reactor/peer.py b/src/exabgp/reactor/peer.py
index 2dc5f5c8..7123f879 100644
--- a/src/exabgp/reactor/peer.py
+++ b/src/exabgp/reactor/peer.py
@@ -419,6 +419,7 @@ class Peer(object):
         self.neighbor.rib.outgoing.replace_restart(previous, current)
         self.neighbor.previous = None

+        self._delay.reset()
         while not self._teardown:
             # we are here following a configuration change
             if self._neighbor:

I wrote it without any testing at a conference ... so it may not do what it should .. That said, it looks like we were missing a reset of the exponential backoff delay timer when we successfully established a connection, so it should work.

ijukic2003 commented 9 months ago

Hi Thomas,

Sorry for the delay with the testing, still the same, after the connection goes down, the SYN is sent exactly every 60s.

thomas-mangin commented 9 months ago

Thank you for the feedback, I will look into it again.

thomas-mangin commented 9 months ago

@ijukic2003 can you please tell me how you are performing your test and did you check 4.2 or main branch? (as the change was only applied to main).

Testing by causing a connection drop in the code seems to work as expected with the connection delay timer not increasing anymore when the session can be setup.

I was seeing an increase with every attempt to reconnect but nothing getting to 60s immediately, instead it increased after each failure (up to 60).

ijukic2003 commented 9 months ago

Hi Thomas,

Yes, sorry, I see now that after the connection drop, it tries to connect pretty fast, and then the time interval between the retries starts increasing exponentially with every new connection attempt. The problem I have is that, for the scale tests I am doing, it takes around 4-5 minutes for the network to stabilize after the connection down trigger and the peer becomes ready again to accept the BGP connection. By that time the Exa re-try timer already gets increased back to 60s. Is there any way to make this a configurable option in the code, so I can set some more aggressive fixed timer?

thomas-mangin commented 9 months ago

As far as I know, this behaviour is now fixed on master, if it need backporting let me know

thomas-mangin commented 4 weeks ago

no news is good news, closing the issue.

ijukic2003 commented 4 weeks ago

Sorry, forgot to update the thread, yes the behavior has been fixed. Thanks for the help. Btw, I think it would be good to have a configurable option for the connect timer - it would be useful when doing the test for failover times for different vendors.

thomas-mangin commented 4 weeks ago

You may be able to use the exabgp.tcp.once option as is done in the test suite and run two commands back to back