futurewei-cloud / alcor

Alcor: Cloud native SDN platform powered by Kubernetes and Istio
MIT License
32 stars 33 forks source link

[Perf] Alcor Control Agent Performance Profiling #441

Open xieus opened 4 years ago

xieus commented 4 years ago

Request

xieus commented 4 years ago

Linked to an umbrella issue #440.

er1cthe0ne commented 3 years ago

Per issue description, I will break down the ACA performance profiling task into two major areas.

ACA handling of large payload

  1. Framework to use: aca_tests to create large payload and send to ACA

  2. Example payload could be 1 port create plus 10, 100, ...1000, 10,000, 100,000 neighbors

  3. Collect latency and throughput metrics

  4. Identify bottleneck and problematic areas (possibly OVS)

  5. Optimize ACA multiple threading model, do we want to limit the max parallel thread to use = number of CPU * 2?

  6. Can we bundle a batch (e.g. 10) of similar neighbors to process in a single call? It may help with the locking mechanism of ACA internal structures.

    ACA handling of packet in message from OVS

  7. Framework to use: cbench (https://github.com/mininet/oflops/tree/master/cbench) to ACA as an openflow controller

  8. Use the payload generated from cbench, test the latency mode then throughput mode

  9. Collect latency and throughput metrics

  10. Identify bottleneck and problematic areas

  11. When we have on demand L3 routing rules implemented, it is possible for VM to quickly create a lot of new connections to a new neighbor which will generate a lot of packet in message to ACA for process. We need to confirm ACA can handle this

  12. If ACA slow down is observed, consider spining up more threads to handle mulitple packet in message in parallel

Other Notes

  1. Can we use framework like SeaStar to improve ACA threading model? https://github.com/futurewei-cloud/chogori-seastar-rd