Exa-Networks / exabgp

The BGP swiss army knife of networking
Other
2.07k stars 445 forks source link

Questions regarding Graceful Restart #328

Closed charlescng closed 9 years ago

charlescng commented 9 years ago

Our organization has been using ExaBGP in our internal networking peering with our Arista equipment. Generally speaking, things have been working smoothly as expected but we do have issues with Graceful Restart. It’s not working as we are expecting and we like to get more information.

There have been cases where a virtual IP is blackholed or moved to a route with a lower local-preference for a few seconds during an ExaBGP restart. We have seen the following during a packet capture:

  1. Graceful restart was successfully negotiated. No NOTIFICATION message was sent by ExaBGP before BGP connection termination.
  2. The Arista switch peering with ExaBGP is correctly retaining the stale routes when ExaBGP terminated the BGP connection.
  3. ExaBGP reestablishes the BGP connection. Since it has no RIB entries, it sends an EOR as required as per RFC 4724 (Section 4.1)
  4. The switch peer drops all stale routes after the EOR was received as per RFC 4724 (Section 4.2)
  5. The child process of ExaBGP sends a command to ExaBGP to announce routes
  6. The original routes are re-established

Step 4 makes sense as ExaBGP does not persist its RIB to disk (https://github.com/Exa-Networks/exabgp/issues/45). There is no way for ExaBGP to resend the stale routes before the mandatory EOR message. At this point, the routes are lost and thus causes a momentary lost of routes to the host.

How does Graceful restart work in this context? Have we configured ExaBGP incorrectly?

Our ExaBGP config looks like this:

group neighbours {
  group-updates;

  process MyProcess {
    receive {
      neighbor-changes;
    }
    encoder json;
    run “/usr/local/bin/MyProcess";
  }

  neighbor 192.168.1.2 {
    router-id router1;
    local-address 192.168.1.4;
    local-as 65534;
    peer-as 65001;
    graceful-restart;
  }

  neighbor 192.168.1.3 {
    router-id router2;
    local-address 192.168.1.4;
    local-as 65534;
    peer-as 65002;
    graceful-restart;
  }
}

@thomas-mangin

thomas-mangin commented 9 years ago

Hi Charles - thank you for this very good report.

For the absence of the NOTIFICATION message. I suggest that we use a new issue (or private mail) to track this particular question as I will need to understand how the session was tear down to see if this behaviour was expected or not.

For the graceful-restart issue, ExaBGP is behaving 'as designed' (but not as expected from your POV it seems) as graceful-restart was not implemented to cover the use case your are presenting. It may however be possible but would require much development and therefore it may be simpler to achieve it using BIRD or Quagga (as a "middle" peering routers).

Graceful-restart was created to help with service announcement when not using multiple servers / HA. When announcing a /32 from ExaBGP using a loopback IP on a server, during reboot, it then become impossible to ping the destination ( network unreachable ) as the /32 will then disappear from the routing table. Graceful-Restart was created to make sure the /32 was still routable during the reboot, and until the Adj-Rib-Out is re-sent.

As you correctly found out, ExaBGP does not keep any states between restart, the Adj-Rib-Out is lost on restart.

Adding this persistence to ExaBGP can be done two way:

First one, the "full clean solution" :

  1. a configuration option to turn it on/off per peer
  2. a sqlite db and some code in rib/store.py to sync data to it
  3. adding the right code to rib/store.py (mostly) to perform the reading on load / update on update
  4. handle failures case (i.e.: session failing to setup on reload, which should cause a cleanup of the DB)

As ExaBGP (master) can now send update SENT to peer to the helper process, developers using ExaBGP can now make sure the Adj-RIB-Out is correctly synchronised.

The quick and dirty way would be to save the updates in "binary" form and assume that the session capabilities (such as ASN4 support) would be the same on reload.

Second one "the dirty quick hack" (my way to live):

It would simply add an option per peer to not send the EOR and then have the process always send it ( which make sense as in your scenario, it is unlikely that the configuration will contain any routes at all ). The logic would then be moved into the helper process.

So I would ask you:

  1. would the second option be acceptable
  2. if so, as I am dead busy atm, could you / someone implement it and I would gladly accept a PR :wink:

Let me know what you think.

charlescng commented 9 years ago

Thanks for the quick answers, Thomas.

I think for our use case, having the option for ExaBGP to not send an EOR and letting the child process handling any route updates is sufficient.

I'll take a stab at implementing something.

Thanks!

thomas-mangin commented 9 years ago

Thank you Charles, if you have any question, feel free to drop me a line on this thread or join our channel on gitter for interactive chat.

thomas-mangin commented 9 years ago

Thank you @charlescng for implementing this feature.