Azure / azure-service-operator

Azure Service Operator allows you to create Azure resources using kubectl
https://azure.github.io/azure-service-operator/
MIT License
741 stars 196 forks source link

Feature: dependencies of resources #4054

Open RSE132 opened 5 months ago

RSE132 commented 5 months ago

I am new to ASO and not sure whether this feature already exist. It would nice if you could guide me. I am trying to create a PublicIPPrefix and 3 egress ips attached to them. when I apply all the yamls I see that IP Prefix gets created but rest of the three egress IPs go in a race condition and they are stuck in a loop (appears to be a race condition)resulting none of the IPs get created.

Alternatively, if I apply the ipprefix.yaml file and wait for few seconds before applying the egress.yaml, everything works fine..

Here is my Yaml file

ipprefix.yaml

apiVersion: network.azure.com/v1api20220701
kind: PublicIPPrefix
metadata:
  name: clusterapi-prov-second-cluster-ip-prefix-2
  namespace: second-cluster
  annotations:
    serviceoperator.azure.com/credential-from: clusterapi-prov-second-cluster-aso-credential
    serviceoperator.azure.com/operator-namespace: azureserviceoperator-system
    serviceoperator.azure.com/reconcile-policy: manage
spec:
  location: westeurope
  owner:
    armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx
  prefixLength: 29
  publicIPAddressVersion: IPv4
  sku:
    name: Standard
    tier: Regional

egress.yaml


apiVersion: network.azure.com/v1api20201101
kind: PublicIPAddress
metadata:
  name: clusterapi-prov-second-cluster-egress-ip-12
  namespace: second-cluster
  annotations:
    serviceoperator.azure.com/credential-from: clusterapi-prov-second-cluster-aso-credential
    serviceoperator.azure.com/operator-namespace: azureserviceoperator-system
    serviceoperator.azure.com/reconcile-policy: manage
spec:
  location: westeurope
  owner:
    armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx
  sku:
    name: Standard
  publicIPAllocationMethod: Static
  publicIPAddressVersion: IPv4
  publicIPPrefix:
    reference:
      armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-prefix-2
---
apiVersion: network.azure.com/v1api20201101
kind: PublicIPAddress
metadata:
  name: clusterapi-prov-second-cluster-egress-ip-22
  namespace: second-cluster
  annotations:
    serviceoperator.azure.com/credential-from: clusterapi-prov-second-cluster-aso-credential
    serviceoperator.azure.com/operator-namespace: azureserviceoperator-system
    serviceoperator.azure.com/reconcile-policy: manage
spec:
  location: westeurope
  owner:
    armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx
  sku:
    name: Standard
  publicIPAllocationMethod: Static
  publicIPAddressVersion: IPv4
  publicIPPrefix:
    reference:
      armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-prefix-2
---
apiVersion: network.azure.com/v1api20201101
kind: PublicIPAddress
metadata:
  name: clusterapi-prov-second-cluster-egress-ip-32
  namespace: second-cluster
  annotations:
    serviceoperator.azure.com/credential-from: clusterapi-prov-second-cluster-aso-credential
    serviceoperator.azure.com/operator-namespace: azureserviceoperator-system
    serviceoperator.azure.com/reconcile-policy: manage
spec:
  location: westeurope
  owner:
    armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx
  sku:
    name: Standard
  publicIPAllocationMethod: Static
  publicIPAddressVersion: IPv4
  publicIPPrefix:
    reference:
      armId: /subscriptions/fxxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-prefix-2

Error

publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-12   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because reso
urce /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-pref
ix-2 used by resource /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-seco
nd-cluster-egress-ip-12 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is Microsoft.WindowsAzure.Networki
ng.Nrp.Frontend.Operations.Csm.PutPublicIpPrefixOperation.: PUT https://management.azure.com/subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sand
box/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-second-cluster-egress-ip-12...
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-2    True               Succeeded
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-22   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because reso
urce /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-pref
ix-2 used by resource /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-seco
nd-cluster-egress-ip-22 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is Microsoft.WindowsAzure.Networki
ng.Nrp.Frontend.Operations.Csm.PutPublicIpPrefixOperation.: PUT https://management.azure.com/subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sand
box/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-second-cluster-egress-ip-22...
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-3    True               Succeeded
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-32   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because reso
urce /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-pref
ix-2 used by resource /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-seco
nd-cluster-egress-ip-32 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is Microsoft.WindowsAzure.Networki
ng.Nrp.Frontend.Operations.Csm.PutPublicIpPrefixOperation.: PUT https://management.azure.com/subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sand
box/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-second-cluster-egress-ip-32...
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-ingress-ip-1   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because res

Expectation It is expected that the IPs have dependency on the IPPrefix, once IPPrefix is created then IPs will be created.

theunrepentantgeek commented 5 months ago

We think there may be a bug where we're not retrying after a failed attempt to reconcile a PublicIPAddress - we'll look into that.

That said, you can make this work by making a minor tweak to your YAML files.

Instead of referencing the PublicIPPrefix by it's ARM ID, reference the ASO resource directly, within the cluster, by using the Group/Kind/Name properties.

E.g.

apiVersion: network.azure.com/v1api20201101
kind: PublicIPAddress
metadata:
  name: clusterapi-prov-second-cluster-egress-ip-12
  namespace: second-cluster
  ...
spec:
  location: westeurope
  owner:
    armId: /subscriptions/xxxxxx-xxxx-xxxxx-xxxxxx-xxxxxxxxxx/resourceGroups/xxxxxxxxx
  sku:
    name: Standard
  publicIPAllocationMethod: Static
  publicIPAddressVersion: IPv4
  publicIPPrefix:
    group: network.azure.com
    kind: PublicIPPrefix
    name: clusterapi-prov-second-cluster-ip-prefix-2

Where possible, you should prefer to reference resources known to ASO by using their Group/Kind/Name instead of by their ARM ID - this gives ASO more information about dependencies and allows it to more intelligently schedule reconciliation attempts. This applies not just to publicIPPrefix but also to owner - ResourceReference allows you to specify ARMID as a way to reference other resources, of types not supported by ASO or simply that you choose not to have ASO manage.

RSE132 commented 5 months ago

I think you meant to use -

  publicIPPrefix:
    reference:
      group: network.azure.com
      kind: PublicIPPrefix
      name: clusterapi-prov-second-cluster-ip-prefix-2

I tried this but looks like here the retry logic is missing again. I still get the similar error although the publicIPPrefix created successfully

PublicIPPrefix Status

NAME                                                                        READY   SEVERITY   REASON      MESSAGE
publicipprefix.network.azure.com/clusterapi-prov-second-cluster-ip-prefix   True               Succeeded

Error

publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-12   False   Warning    ReferenceNotFound   failed resolving ARM IDs for references: second-cluster/c
lusterapi-prov-second-cluster-ip-prefix-2 does not exist (PublicIPPrefix.network.azure.com "clusterapi-prov-second-cluster-ip-prefix-2" not found)
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-22   False   Warning    ReferenceNotFound   failed resolving ARM IDs for references: second-cluster/c
lusterapi-prov-second-cluster-ip-prefix-2 does not exist (PublicIPPrefix.network.azure.com "clusterapi-prov-second-cluster-ip-prefix-2" not found)
publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-32   False   Warning    ReferenceNotFound   failed resolving ARM IDs for references: second-cluster/c
lusterapi-prov-second-cluster-ip-prefix-2 does not exist (PublicIPPrefix.network.azure.com "clusterapi-prov-second-cluster-ip-prefix-2" not found)

Again after waiting for some time I see the same erros that I provided initially -

publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-12   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because reso
urce /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-pref
ix-2 used by resource /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-seco
nd-cluster-egress-ip-12 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is Microsoft.WindowsAzure.Networki
ng.Nrp.Frontend.Operations.Csm.PutPublicIpPrefixOperation.: PUT https://management.azure.com/subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sand
box/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-second-cluster-egress-ip-12...

publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-22   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because reso
urce /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-pref
ix-2 used by resource /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-seco
nd-cluster-egress-ip-22 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is Microsoft.WindowsAzure.Networki
ng.Nrp.Frontend.Operations.Csm.PutPublicIpPrefixOperation.: PUT https://management.azure.com/subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sand
box/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-second-cluster-egress-ip-22...

publicipaddress.network.azure.com/clusterapi-prov-second-cluster-egress-ip-32   False   Error      ReferencedResourceNotProvisioned   Cannot proceed with operation because reso
urce /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPPrefixes/clusterapi-prov-second-cluster-ip-pref
ix-2 used by resource /subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sandbox/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-seco
nd-cluster-egress-ip-32 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is Microsoft.WindowsAzure.Networki
ng.Nrp.Frontend.Operations.Csm.PutPublicIpPrefixOperation.: PUT https://management.azure.com/subscriptions/f3968128-6156-4ded-952a-088a59b97c1c/resourceGroups/rgpazewdaks01sand
box/providers/Microsoft.Network/publicIPAddresses/clusterapi-prov-second-cluster-egress-ip-32...
theunrepentantgeek commented 4 months ago

The warnings you are seeing appear to be normal to me - but the errors do seem to indicate a problem.

I'll try to reproduce the problem and see if I can work out what's going on.

RSE132 commented 3 days ago

@theunrepentantgeek is this bug fixed on v2.9.0 already ?