FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.27k stars 1.24k forks source link

Substitute poll() with epoll() in libfrr to boost performance in BGP anomalies simulation #12852

Open ponedo opened 1 year ago

ponedo commented 1 year ago

Problem overview

In all currently available FRR version, thread.c and thread.h in libfrr used poll() function to poll file descriptors. We advise to substitute it with epoll() to enhance scalability.

Motivation

We are trying run BGP abnormal events simulation with FRR routers. In our scenario, we aim to run plenty of (hundreds/thousands of) FRR routers, which are mutually TCP connected to simulate BGP peer topology in reality, on a COTS server.

Experiment

Currently we have not simulate BGP peer topology in reality yet. Rather, we configure a fullmesh BGP connection between all FRR routers for test purpose. Also we have not launch route announcement and withdrawal yet. Only BGP keepalive packets are sent between FRR BGP peers. Environment: AMD EPYC 7742 64-Core Processor(Only 16 cores used); 660GB RAM; CentOS7; Linux kernel version 5.17.3-1.el7.elrepo.x86_64; FRR version 8.0. FRR BGP Configuration: fullmesh connection; keepalive interval: 1s; holdtime: 3s.

Observation

We found poor scalability when deploying increasing number of FRR routers. As seen from figure below, CPU util increases quadratically at the early stage of scaling. After 250 peers deployed, the CPU util increasing slows down. However at this point FRR peers become stressful. Finally, disconnection between peers happens after 384 peers being deployed.

image

Analysis

We drew flamegraph of BGP threads of FRR, and found poll() function was the bottleneck. As shown below

image

poll() function scans all file descriptors every time it is called. In our case, these file descriptors are mostly TCP sockets established with BGP peers. As the number of peers deployed (denoted by n) increases, every poll() call consumes O(n) time, leading to O(n) CPU util for each peer. Multiplied by n(number of BGP threads doing poll()), the system-wide CPU util is O(n^2), matching the observation above. To comfirm tha poll() introduces great overhead, we measured average running time of per poll() call. As shown below, the call time did increase linearly with peer number.

image

However, scanning all TCP sockets is a waste in this scenario, since keepalive interval (1s) is long. At most time, each TCP socket is not ready for reading. Therefore, substituting poll() with epoll APIs will help reduce CPU overhead significantly.

Solution (prototype)

We re-wrote thread.c and thread.h in libfrr with epoll APIs, and ran experiement again. As shown below, CPU overhead is greatly reduced (nearly 67%).

image

We provide our code here. However our code was just a prototype, since Vtysh command such as show_thread_poll_cmd was not implemented, and bugs may exist. If the community feels this issue important, we can complete the code. code.zip

ton31337 commented 1 year ago

Great analysis! Could you create a pull request following our guide?

ponedo commented 1 year ago

Sure. But I haven't implement show_thread_epoll vtysh command and there may exist bugs. Is that ok?

ton31337 commented 1 year ago

But I haven't implement show_thread_epoll vtysh command and there may exist bugs

It would be better if we could see some debugging information.

eqvinox commented 1 year ago

Thanks a lot for the cool analysis & data! Getting input like this is very useful.

We re-wrote thread.c and thread.h in libfrr with epoll APIs

Could you share your changes? You can open a pull request in "draft" state, it absolutely does not need to be "perfect". It would be great to allow others to make some tests / comparisons too.

ponedo commented 1 year ago

I have opened PR at https://github.com/FRRouting/frr/pull/12855. It seemed failing some checks. I will update according to guidelines soon.

ponedo commented 1 year ago

@ton31337 @eqvinox The PR link above is wrong, and 12855 PR was discarded mistakenly somehow. I have opened a new PR (#12862). I encoutered a compatibility problem and I will explain in the PR page.

ponedo commented 10 months ago

Hello, dear participants @ton31337 @eqvinox @qlyoung, Recently we are using FRRouting for routing system emulation as mentioned above, and still observing high cpu usage of ppoll system call when the scale of emulated network is large:

17b531aef66e75a357582b0825c8779

The PR has been opened for 9 months and seems to be a little bit slow in terms of progressing. But I think it will help a lot for FRR to be used in more scenarios, such as network emulation, if we could continue with this PR to optimize the polling method. I'd really appreciate it if you could review the code soon.