Add feature azurerm_kubernetes_cluster_trusted_access_role_binding

drew0ps commented 2 weeks ago

Description of your changes

Adds azurerm_kubernetes_cluster_trusted_access_role_binding to the authorization provider.

Fixes #

I have:

[X] Read and followed Crossplane's [contribution process].
[X] Run make reviewable to ensure this PR is ready for review.
[-] Added backport release-x.y labels to auto-backport this PR if necessary.

Notes

2 notable things about this PR:

external name configured quite standard, the referenced var exists in the generated init paramenets type file.
Added an additional resource configurator and it's dependency rconfig here so that the resource ends up in the authorization provider as opposed to the containerservice one.

How has this code been tested

I've built the provider as a docker image using make e2e

I've took the image to my company's test environment and deployed the following tempalte:

      ---
      apiVersion: authorization.azure.upbound.io/v1beta1
      kind: TrustedAccessRoleBinding
      metadata:
        annotations:
          gotemplating.fn.crossplane.io/composition-resource-name: trustedaccessrolebinding
        name: "{{ $.observed.composite.resource.metadata.name }}-ta"
      spec:
        forProvider:
          kubernetesClusterIdRef:
            name: {{ .observed.composite.resource.metadata.name }}
          roles:
            - "Microsoft.DataProtection/backupVaults/backup-operator"
          sourceResourceId: "{{ .observed.resources.backupvault.resource.status.atProvider.id }}"
        providerConfigRef:
          name: "{{ $.observed.composite.resource.metadata.name }}"

Then I took a look at the resources:

k get trustedaccessrolebindings.authorization.azure.upbound.io 
NAME                    SYNCED   READY   EXTERNAL-NAME           AGE
aks-cluster-3-ta   True     True    aks-cluster-3-ta   13m
aks-cluster-4-ta  True     True    aks-cluster-3-ta  13m

Then I described the resources:

Status:
At Provider:
Id:                     /subscriptions/redactedname/resourceGroups/redactednameproviders/Microsoft.ContainerService/managedClusters/redactedname/trustedAccessRoleBindings/redactedname-ta
Kubernetes Cluster Id:  /subscriptions/redactedname/resourceGroups/redactedname/providers/Microsoft.ContainerService/managedClusters/redactedname
Roles:
  Microsoft.DataProtection/backupVaults/backup-operator
Source Resource Id:  /subscriptions/redactedid/resourceGroups/redactedname/providers/Microsoft.DataProtection/backupVaults/redactedname
Conditions:
Last Transition Time:  2024-11-07T17:43:35Z
Reason:                Available
Status:                True
Type:                  Ready
Last Transition Time:  2024-11-07T17:43:34Z
Reason:                ReconcileSuccess
Status:                True
Type:                  Synced
Last Transition Time:  2024-11-07T17:43:35Z
Reason:                Success
Status:                True
Type:                  LastAsyncOperation
Events:
Type    Reason                   Age   From                                                                           Message
----    ------                   ----  ----                                                                           -------
Normal  CreatedExternalResource  48s   managed/authorization.azure.upbound.io/v1beta1, kind=trustedaccessrolebinding  Successfully requested creation of external resource

mergenci commented 2 weeks ago

As far as I understand, TrustedAccessRoleBinding and Workspace resources failed. Uptest output easily gets crowded, especially when there are multiple dependent resources and the tests are failing. I'm planning to work on a solution. Here's the relevant parts that I could find:

- apiVersion: authorization.azure.upbound.io/v1beta1
  kind: TrustedAccessRoleBinding
status:
    conditions:
    - lastTransitionTime: "2024-11-10T22:36:23Z"
      message: 'cannot resolve references: mg.Spec.ForProvider.SourceResourceID: referenced
        field was empty (referenced resource may not yet be ready)'
      reason: ReconcileError
      status: "False"
      type: Synced
- apiVersion: machinelearningservices.azure.upbound.io/v1beta2
  kind: Workspace
  status:
    conditions:
    - lastTransitionTime: "2024-11-10T22:35:42Z"
      message: |-
        create failed: async create failed: failed to create the resource: ["***"0 creating Workspace (Subscription: "2895a7df-ae9f-41b8-9e78-3ce4926df838"
        Resource Group Name: "example-tarb-rg"
        Workspace Name: "example-tarb-wspace"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:

        Status: "BadRequest"
        Code: ""
        Message: "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
        Activity Id: ""

        ---

        API Response:

        ----[start]----
        "***"
          "status": "Failed",
          "error": "***"
            "code": "BadRequest",
            "message": "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
          "***"
        "***"
        -----[end]-----
          []"***"]
      reason: ReconcileError
      status: "False"
      type: Synced
    - lastTransitionTime: "2024-11-10T22:35:27Z"
      reason: Creating
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-11-10T22:35:42Z"
      message: |-
        async create failed: failed to create the resource: ["***"0 creating Workspace (Subscription: "2895a7df-ae9f-41b8-9e78-3ce4926df838"
        Resource Group Name: "example-tarb-rg"
        Workspace Name: "example-tarb-wspace"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:

        Status: "BadRequest"
        Code: ""
        Message: "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
        Activity Id: ""

        ---

        API Response:

        ----[start]----
        "***"
          "status": "Failed",
          "error": "***"
            "code": "BadRequest",
            "message": "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
          "***"
        "***"
        -----[end]-----
          []"***"]
      reason: AsyncCreateFailure
      status: "False"
      type: LastAsyncOperation

drew0ps commented 2 weeks ago

@mergenci Thanks for the efforts to help. That issue is clear, that's just because there is a soft delete retention setting on the vault which apparently can't be disabled and must be set between 7 and 90 days as per this doc. The only way around this is changing the name until the deletion completes after 7 days. The run I am curious about is this one where everything seems to have succeeded but the pipeline still doesn't proceed as it complains about the vault without an error message.

mergenci commented 2 weeks ago

That one seems to be missing the Test status condition in kind: Vault, which usually means that there was an update loop. If so, you can examine pod logs to see if there are any “Diff detected” messages. As far as I know, the only indication of an update loop in Uptest logs is the timeout message:

case.go:363: command "$***KUBECTL*** wait vault.keyvault.azure.upbound.io/example-ai-v --for=condition=Test --timeout 10s" exceeded 7 sec timeout, context deadline exceeded

drew0ps commented 2 weeks ago

Hi @turkenf,

Thanks a lot for the response and your suggestions - i've added both of them to the example file.

The "resource exists with the same name" issue is present for all resources in the example file until the deletion topic is solved. I'm not sure if it would be possible to force delete the resource group in azure? In my opinion that would be a good approach since we could end up in this situation in different cases as well. I am not sure if the resources created by my pipeline are running indefinitely in Azure.

Unfortunately I can't test these specific examples manually as I could only test the AKS Trusted Access relation creation for the BackupInstanceKubernetesCluster at my workplace - where my user is only able to create some specific resources which I have to manage.

turkenf commented 2 weeks ago

The "resource exists with the same name" issue is present for all resources in the example file until the deletion topic is solved.

Hi @drew0ps, in fact, all resources except the resource group in the example YAML file are deleted. Just randomizing the name of the workspace resource as you mentioned here is enough. As I mentioned above, the main problem here is the creation of two extra resources Application Insights Smart Detection and Failure Anomalies. These are not in our YAML file and I think they are created by the API because something went wrong.

I'm not sure if it would be possible to force delete the resource group in azure?

I don't know if this is possible, but even if it was, I wouldn't choose it.

I am not sure if the resources created by my pipeline are running indefinitely in Azure.

I spent some time today cleaning these out of our test account 🙂

Unfortunately I can't test these specific examples manually as I could only test the AKS Trusted Access relation creation for the BackupInstanceKubernetesCluster at my workplace - where my user is only able to create some specific resources which I have to manage.

I understand and I really appreciate your interest, but I prefer not to proceed without understanding why the extra resources are being created and seeing if we can resolve this situation.

drew0ps commented 2 weeks ago

Hi @turkenf,

Thanks a lot for the additional insights - randomizing the workspace name makes sense with your explanation.

I spent some time today cleaning these out of our test account 🙂

Sorry about this one, I won't run the pipeline until I figure this out manually. I thought after 6 hours, when the pipeline ends (times out?), the resources are somehow force deleted.

... I prefer not to proceed without understanding why the extra resources are being created and seeing if we can resolve this situation.

Understood - I'll spend a bit more time testing manually on why this happens and how to prevent it. The screenshot you added already helps a lot in this.

mjnovice commented 1 week ago

👍

drew0ps commented 1 week ago

/test-examples="examples/alertsmanagement/v1beta1/monitorsmartdetectoralertrule.yaml"

drew0ps commented 1 week ago

/test-examples="examples/authorization/v1beta1/trustedaccessrolebinding.yaml"

drew0ps commented 1 week ago

Hi @mergenci and @turkenf - Thanks a lot for the hints - the problem was some missing resources for the Application Insights, which get created by default if not defined in crossplane. Of course when not defined crossplane can't request their deletion. It's fixed now, uptest pipeline is green.

jeanduplessis commented 1 week ago

Thank you for your contribution and persevering on this one @drew0ps

crossplane-contrib / provider-upjet-azure