giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Validate that private clusters do not have overlapping network ranges #1985

Open nprokopic opened 1 year ago

nprokopic commented 1 year ago

Is your feature request related to a problem? Please describe.

When I am creating a private workload cluster (CAPA or CAPZ) I have to specify a network range (AWS VPC / Azure VNet) that is not overlapping with management cluster network range and also not overlapping with any other workload cluster network range withing the same management cluster.

In order to specify a non-overlapping network range for the new workload cluster, I have to check network ranges of all other aforementioned clusters and calculate the next available network range which is big enough for a Kubernetes cluster.

All this described work is currently manual, thus very much error prone. When an invalid network range is specified, the error is not visible immediately, as all cluster resources are applied and cluster creation is initiated. The error can be noticed later when cluster creation fails, because peering between management cluster network and the newly created workload cluster network cannot be established.

Describe the solution you'd like

When I try to create a cluster, an admission controller validates all applied resources (e.g. AWSCluster, AzureCluster). That validation admission controller checks if used network range is valid, which for private clusters means that all workload clusters and the management cluster itself are using non-overlapping network ranges. In case of a validation failure, cluster creation is not started and an appropriate validation error message is immediately visible on the front-end side (e.g. kubectl output).

Describe alternatives you've considered

In addition to a validation admission controller written in Go, another possible solution could be validation with Kyverno.

Front-end validations, such as schema validation, are not powerful enough, because here the validation is "dynamic" and a valid network range is different every time.

Additional context

Nothing at the moment.

Outcome

bavarianbidi commented 1 year ago

the ClusterNetwork struct specifies the different networking parameters for a cluster. The "critical" CIDRs we care for are provider specific as they are mostly used for some VPN peering stuff.

But as discussed right know, let's try to keep the new validation controller compatible for CAPZ and CAPA

// ClusterNetwork specifies the different networking
// parameters for a cluster.
type ClusterNetwork struct {
    // APIServerPort specifies the port the API Server should bind to.
    // Defaults to 6443.
    // +optional
    APIServerPort *int32 `json:"apiServerPort,omitempty"`

    // The network ranges from which service VIPs are allocated.
    // +optional
    Services *NetworkRanges `json:"services,omitempty"`

    // The network ranges from which Pod networks are allocated.
    // +optional
    Pods *NetworkRanges `json:"pods,omitempty"`

    // Domain name for services.
    // +optional
    ServiceDomain string `json:"serviceDomain,omitempty"`
}
bavarianbidi commented 1 year ago

Maybe it's also worth to check if the experimental ipam controller could take over some parts of requirements: KEP: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220125-ipam-integration.md

Rotfuks commented 1 year ago

Since we moved to private links/endpoints for the MC/WC connection, we don't need this story in the scope of the private networking epic any more. This might be an interesting feature on it's own for customers still going for network peering, but has a low priority now.