Open danilo404 opened 1 week ago
Can you share what the spec for the subnet looks like, as managed by CAPZ?
I think the issue we've got here is the fact that there are 14k entries for the ipConfigurations
field (which Azure allows), but at some point you cross the Kubernetes boundary for max resource size.
There is also a max resource size boundary for Azure I believe, but I think it's 4mb not 1.5mb which AFAIK is the default on Kubernetes.
Can you share what the spec for the subnet looks like, as managed by CAPZ?
AMCP resource:
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedControlPlane
spec:
virtualNetwork:
cidrBlock: 10.0.0.0/16
name: example-cluster-vnet
resourceGroup: example-cluster-rg
subnet:
cidrBlock: 10.0.0.0/16
name: example-cluster-subnet
serviceEndpoints:
- locations:
- '*'
service: Microsoft.Sql
- locations:
- '*'
service: Microsoft.KeyVault
- locations:
- '*'
service: Microsoft.Storage
- locations:
- '*'
service: Microsoft.AzureCosmosDB
- locations:
- '*'
service: Microsoft.ServiceBus
- locations:
- '*'
service: Microsoft.EventHub
And the Subnet it creates:
apiVersion: network.azure.com/v1api20201101
kind: VirtualNetworksSubnet
spec:
addressPrefix: 10.0.0.0/16
addressPrefixes:
- 10.0.0.0/16
azureName: example-cluster-subnet
owner:
name: example-cluster-vnet
serviceEndpoints:
- locations:
- '*'
service: Microsoft.Sql
- locations:
- '*'
service: Microsoft.KeyVault
- locations:
- '*'
service: Microsoft.Storage
- locations:
- '*'
service: Microsoft.AzureCosmosDB
- locations:
- '*'
service: Microsoft.ServiceBus
- locations:
- '*'
service: Microsoft.EventHub
I looked at this some more and I think this comes down to a mismatch between the allowed max size of an Azure resource (which is I think somewhere in the 4mb range) and the allowed max size of a Kubernetes resource, which is ~1.5mb.
Since we fundamentally cannot fit this much data into etcd, there's not really much we can do here other than elide the .status.ipConfigurations
after some maximum length. The only thing that makes me feel any better about that is the fact that it's probably not practically possible to really use a list of 14000 ipConfiguration ARM IDs for anything anyway.
@nojnhuh - is CAPZ using .status.ipConfigurations
for anything right now?
@nojnhuh - is CAPZ using
.status.ipConfigurations
for anything right now?
It is not, so however you handle that should work for CAPZ.
Hey @matthchr, thanks so much for looking into this. Irt the etcd limit, the problem seems to manifest in different ways depending on the size of the object in Azure. Note that in the original ticket I opened in CAPZ, the error was different and it came from etcd:
E0315 17:13:54.206966 1 controller.go:329] "msg"="Reconciler error" "error"="updating mynamespace/examplecluster-vnet-examplecluster-subnet resource status: etcdserver: request is too large" "logger"="controllers" "name"="examplecluster-vnet-examplecluster-subnet" "namespace"="examplenamespace" "reconcileID"="..."
In that case, also note that the Subnet was not as large, when the error was observed, the subnet size was around 2.9mb.
Now the subnet object in Azure reached around 5.6mb and the error seems to come from the Kubernetes API server itself, this limit is hardcoded in more than on place, e.g. here.
I think in this case the object did not reach etcd.
Thanks @danilo404 - I suppose a more precise phrasing of the problem is not so much etcd but: Azure allows larger resources than Kubernetes. I think once the etcd limit is crossed it won't work in k8s, though I didn't know about the hardcoded apsierver limit that ends up giving a different error if the request gets large enough.
In terms of plan to fix this, it didn't make 2.11.0 (which has already shipped). I think we can try getting a fix merged before most of us go on holiday, which could enable consumption of the fix via the experimental release, but official release will probably need to wait until next year. There's also the added wrinkle of CAPZ using a slightly older version of ASO which may delay uptake in vanilla CAPZ as well.
Unfortunately I don't really see a workaround for this problem other than "keep the cluster small" in the meantime, though possibly this issue isn't actually breaking things severely if CAPZ isn't trying to update the subnet?
Can you share what the impact is to you @danilo404, and if you have any workaround to it currently?
Thanks for the update @matthchr. We don't have workarounds for this case, but the impact for now is not blocking. What happens now is that the CAPZ object AzureManagedControlPlane
reconcile loop tries to sync the Subnetwork's status (even without changes to the spec) and the CAPI/CAPZ Cluster stays in a Failed
state in Kubernetes, but the cluster itself in Azure is healthy. In any case the experimental release would be really useful, because the AMCP in 'failed' state causes other headaches, like the Flux orchestration that is unable to progress, and related alerts' silencing etc.
Describe the bug
The bug manifests on our cluster created with the following networking parameters:
And it has 20 Agent Pools, with the following sizes:
CAPZ created a
VirtualNetworksSubnet
ASO CR for that cluster with the following configuration:When the AgentPools reach somewhere close to the "counts" above, the
VirtualNetworksSubnet
object in azure grows in size to around 5.6mb, if fills up with thousands of entries in theipConfigurations
field:ASO then tries to persist the
ipConfigurations
into theVirtualNetworksSubnet
CR's status and this causes the api server to return:Azure Service Operator Version: v2.8.0
Expected behavior
The
VirtualNetworksSubnet
to continue reconciling successfuly for any scalable size of my Agent Pools.To Reproduce
Create a
VirtualNetworksSubnet
CR for an Azure Cloud Subnet with a large number ofipConfigurations
and wait for the controller to attempt to sync it.Additional context
This issue relates to another issue in the CAPZ project https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/4649