crossplane-contrib / provider-upjet-azure

Azure Provider for Crossplane.
https://marketplace.upbound.io/providers/upbound/provider-family-azure/
Apache License 2.0
63 stars 76 forks source link

Add feature azurerm_kubernetes_cluster_trusted_access_role_binding #871

Closed drew0ps closed 1 week ago

drew0ps commented 2 weeks ago

Description of your changes

Adds azurerm_kubernetes_cluster_trusted_access_role_binding to the authorization provider.

Fixes #

I have:

Notes

2 notable things about this PR:

How has this code been tested

mergenci commented 2 weeks ago

As far as I understand, TrustedAccessRoleBinding and Workspace resources failed. Uptest output easily gets crowded, especially when there are multiple dependent resources and the tests are failing. I'm planning to work on a solution. Here's the relevant parts that I could find:

- apiVersion: authorization.azure.upbound.io/v1beta1
  kind: TrustedAccessRoleBinding
status:
    conditions:
    - lastTransitionTime: "2024-11-10T22:36:23Z"
      message: 'cannot resolve references: mg.Spec.ForProvider.SourceResourceID: referenced
        field was empty (referenced resource may not yet be ready)'
      reason: ReconcileError
      status: "False"
      type: Synced
- apiVersion: machinelearningservices.azure.upbound.io/v1beta2
  kind: Workspace
  status:
    conditions:
    - lastTransitionTime: "2024-11-10T22:35:42Z"
      message: |-
        create failed: async create failed: failed to create the resource: ["***"0 creating Workspace (Subscription: "2895a7df-ae9f-41b8-9e78-3ce4926df838"
        Resource Group Name: "example-tarb-rg"
        Workspace Name: "example-tarb-wspace"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:

        Status: "BadRequest"
        Code: ""
        Message: "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
        Activity Id: ""

        ---

        API Response:

        ----[start]----
        "***"
          "status": "Failed",
          "error": "***"
            "code": "BadRequest",
            "message": "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
          "***"
        "***"
        -----[end]-----
          []"***"]
      reason: ReconcileError
      status: "False"
      type: Synced
    - lastTransitionTime: "2024-11-10T22:35:27Z"
      reason: Creating
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-11-10T22:35:42Z"
      message: |-
        async create failed: failed to create the resource: ["***"0 creating Workspace (Subscription: "2895a7df-ae9f-41b8-9e78-3ce4926df838"
        Resource Group Name: "example-tarb-rg"
        Workspace Name: "example-tarb-wspace"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:

        Status: "BadRequest"
        Code: ""
        Message: "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
        Activity Id: ""

        ---

        API Response:

        ----[start]----
        "***"
          "status": "Failed",
          "error": "***"
            "code": "BadRequest",
            "message": "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
          "***"
        "***"
        -----[end]-----
          []"***"]
      reason: AsyncCreateFailure
      status: "False"
      type: LastAsyncOperation
drew0ps commented 2 weeks ago

@mergenci Thanks for the efforts to help. That issue is clear, that's just because there is a soft delete retention setting on the vault which apparently can't be disabled and must be set between 7 and 90 days as per this doc. The only way around this is changing the name until the deletion completes after 7 days. The run I am curious about is this one where everything seems to have succeeded but the pipeline still doesn't proceed as it complains about the vault without an error message.

mergenci commented 2 weeks ago

That one seems to be missing the Test status condition in kind: Vault, which usually means that there was an update loop. If so, you can examine pod logs to see if there are any “Diff detected” messages. As far as I know, the only indication of an update loop in Uptest logs is the timeout message:

case.go:363: command "$***KUBECTL*** wait vault.keyvault.azure.upbound.io/example-ai-v --for=condition=Test --timeout 10s" exceeded 7 sec timeout, context deadline exceeded
drew0ps commented 2 weeks ago

Hi @turkenf,

Thanks a lot for the response and your suggestions - i've added both of them to the example file.

The "resource exists with the same name" issue is present for all resources in the example file until the deletion topic is solved. I'm not sure if it would be possible to force delete the resource group in azure? In my opinion that would be a good approach since we could end up in this situation in different cases as well. I am not sure if the resources created by my pipeline are running indefinitely in Azure.

Unfortunately I can't test these specific examples manually as I could only test the AKS Trusted Access relation creation for the BackupInstanceKubernetesCluster at my workplace - where my user is only able to create some specific resources which I have to manage.

turkenf commented 2 weeks ago

The "resource exists with the same name" issue is present for all resources in the example file until the deletion topic is solved.

Hi @drew0ps, in fact, all resources except the resource group in the example YAML file are deleted. Just randomizing the name of the workspace resource as you mentioned here is enough. As I mentioned above, the main problem here is the creation of two extra resources Application Insights Smart Detection and Failure Anomalies. These are not in our YAML file and I think they are created by the API because something went wrong.

I'm not sure if it would be possible to force delete the resource group in azure?

I don't know if this is possible, but even if it was, I wouldn't choose it.

I am not sure if the resources created by my pipeline are running indefinitely in Azure.

I spent some time today cleaning these out of our test account 🙂

Unfortunately I can't test these specific examples manually as I could only test the AKS Trusted Access relation creation for the BackupInstanceKubernetesCluster at my workplace - where my user is only able to create some specific resources which I have to manage.

I understand and I really appreciate your interest, but I prefer not to proceed without understanding why the extra resources are being created and seeing if we can resolve this situation.

drew0ps commented 2 weeks ago

Hi @turkenf,

Thanks a lot for the additional insights - randomizing the workspace name makes sense with your explanation.

I spent some time today cleaning these out of our test account 🙂

Sorry about this one, I won't run the pipeline until I figure this out manually. I thought after 6 hours, when the pipeline ends (times out?), the resources are somehow force deleted.

... I prefer not to proceed without understanding why the extra resources are being created and seeing if we can resolve this situation.

Understood - I'll spend a bit more time testing manually on why this happens and how to prevent it. The screenshot you added already helps a lot in this.

mjnovice commented 1 week ago

👍

drew0ps commented 1 week ago

/test-examples="examples/alertsmanagement/v1beta1/monitorsmartdetectoralertrule.yaml"

drew0ps commented 1 week ago

/test-examples="examples/authorization/v1beta1/trustedaccessrolebinding.yaml"

drew0ps commented 1 week ago

Hi @mergenci and @turkenf - Thanks a lot for the hints - the problem was some missing resources for the Application Insights, which get created by default if not defined in crossplane. Of course when not defined crossplane can't request their deletion. It's fixed now, uptest pipeline is green.

jeanduplessis commented 1 week ago

Thank you for your contribution and persevering on this one @drew0ps