A Hyperscale Cloud Native SDN Platform
In this README:
Cloud computing means scale and on-demand resource provisioning. As more enterprise customers migrate their on premise workloads to the cloud, the user base of a cloud provider could grow at a rate of 10X in just a few years. This will require a cloud virtual networking system with a more scalable and extensible design. As a part of the community effort, Alcor is an open-source cloud native platform that provides high availability, high performance, and large scale virtual networking control plane and management plane at a high resource provisioning rate.
Alcor leverages the latest SDN and container technologies as well as an advanced distributed system design to support deployment, configuration and scale-out of millions of VM and containers. It is built based on a distributed micro-services architecture with a uniform way to secure, connect, and monitor control plane micro-services, and fine-grained control of service-to-service communication including load balancing, retries, failovers, and rate limits. Alcor also offers a way to unify VM and container networking management, and ensures ultra-low latency and high throughput due to its application aware fast path when provisioning containers and serverless applications.
The following diagram illustrates the high-level architecture of Alcor control plane.
Detailed design docs:
Alcor leverages Kubernetes and Istio to build its distributed micro-services architecture. Depending on the control plane load, Alcor Controller scales out with multiple instances and each instance is a Kubernetes application. One step further, each application contains various infrastructure microservices to manage different types of network resources.
Alcor focuses on top-down throughput optimization on every system layer including API, Controller, messaging mechanism, and host agent. For example, a batch API is provided to support deploying a group of ports with a single POST call, and a message batching mechanism is proposed on a per-host basis, which is capable of driving groups (potentially thousands) of resources to the same host in one shot.
To support time-critical applications, Alcor enables a direct communication channel from Controller to Host Agent. This channel bypasses a message queueing system like Kafka, and utilizes gRPC to offer 10x latency improvement compared to Kafka.
A list of planned features is included our current roadmap. Some highlighted items:
The Alcor project is divided across a few GitHub repositories.
alcor/alcor: This is the main repository of Alcor Regional Controller that you are currently looking at. It hosts controllers' source codes, build and deployment instructions, and various documents that detail the design of Alcor.
alcor/alcor_control_agent: This repository contains source codes for a host-level stateless agent that connects regional controllers to the host data-plane component. It is responsible for programming on-host data plane with various network configuration for CURD of VPC, subnet, port, Security group etc., and monitoring network health of containers and VMs on the host.
alcor/integration: The integration repository contains codes and scripts for end-to-end integration of Alcor control plane with popular orchestration platforms and data plane implementations. We currently support integration with Kubernetes (via CNI plugin) and Mizar Data Plane. We will continue to integrate with other orchestration systems and data plane implementations.
alcor/meeting: The meeting repository is used to store all the meeting notes and recorded video clips for the Alcor Open Source project.
This main repository of Alcor Regional Controller is organized as follows: