Closed drew0ps closed 1 week ago
As far as I understand, TrustedAccessRoleBinding
and Workspace
resources failed. Uptest output easily gets crowded, especially when there are multiple dependent resources and the tests are failing. I'm planning to work on a solution. Here's the relevant parts that I could find:
- apiVersion: authorization.azure.upbound.io/v1beta1
kind: TrustedAccessRoleBinding
status:
conditions:
- lastTransitionTime: "2024-11-10T22:36:23Z"
message: 'cannot resolve references: mg.Spec.ForProvider.SourceResourceID: referenced
field was empty (referenced resource may not yet be ready)'
reason: ReconcileError
status: "False"
type: Synced
- apiVersion: machinelearningservices.azure.upbound.io/v1beta2
kind: Workspace
status:
conditions:
- lastTransitionTime: "2024-11-10T22:35:42Z"
message: |-
create failed: async create failed: failed to create the resource: ["***"0 creating Workspace (Subscription: "2895a7df-ae9f-41b8-9e78-3ce4926df838"
Resource Group Name: "example-tarb-rg"
Workspace Name: "example-tarb-wspace"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:
Status: "BadRequest"
Code: ""
Message: "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
Activity Id: ""
---
API Response:
----[start]----
"***"
"status": "Failed",
"error": "***"
"code": "BadRequest",
"message": "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
"***"
"***"
-----[end]-----
[]"***"]
reason: ReconcileError
status: "False"
type: Synced
- lastTransitionTime: "2024-11-10T22:35:27Z"
reason: Creating
status: "False"
type: Ready
- lastTransitionTime: "2024-11-10T22:35:42Z"
message: |-
async create failed: failed to create the resource: ["***"0 creating Workspace (Subscription: "2895a7df-ae9f-41b8-9e78-3ce4926df838"
Resource Group Name: "example-tarb-rg"
Workspace Name: "example-tarb-wspace"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:
Status: "BadRequest"
Code: ""
Message: "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
Activity Id: ""
---
API Response:
----[start]----
"***"
"status": "Failed",
"error": "***"
"code": "BadRequest",
"message": "Soft-deleted workspace exists. Please purge or recover it. https://aka.ms/wsoftdelete"
"***"
"***"
-----[end]-----
[]"***"]
reason: AsyncCreateFailure
status: "False"
type: LastAsyncOperation
@mergenci Thanks for the efforts to help. That issue is clear, that's just because there is a soft delete retention setting on the vault which apparently can't be disabled and must be set between 7 and 90 days as per this doc. The only way around this is changing the name until the deletion completes after 7 days. The run I am curious about is this one where everything seems to have succeeded but the pipeline still doesn't proceed as it complains about the vault without an error message.
That one seems to be missing the Test
status condition in kind: Vault
, which usually means that there was an update loop. If so, you can examine pod logs to see if there are any “Diff detected” messages. As far as I know, the only indication of an update loop in Uptest logs is the timeout message:
case.go:363: command "$***KUBECTL*** wait vault.keyvault.azure.upbound.io/example-ai-v --for=condition=Test --timeout 10s" exceeded 7 sec timeout, context deadline exceeded
Hi @turkenf,
Thanks a lot for the response and your suggestions - i've added both of them to the example file.
The "resource exists with the same name" issue is present for all resources in the example file until the deletion topic is solved. I'm not sure if it would be possible to force delete the resource group in azure? In my opinion that would be a good approach since we could end up in this situation in different cases as well. I am not sure if the resources created by my pipeline are running indefinitely in Azure.
Unfortunately I can't test these specific examples manually as I could only test the AKS Trusted Access relation creation for the BackupInstanceKubernetesCluster at my workplace - where my user is only able to create some specific resources which I have to manage.
The "resource exists with the same name" issue is present for all resources in the example file until the deletion topic is solved.
Hi @drew0ps, in fact, all resources except the resource group in the example YAML file are deleted. Just randomizing the name of the workspace resource as you mentioned here is enough.
As I mentioned above, the main problem here is the creation of two extra resources Application Insights Smart Detection
and Failure Anomalies
. These are not in our YAML file and I think they are created by the API because something went wrong.
I'm not sure if it would be possible to force delete the resource group in azure?
I don't know if this is possible, but even if it was, I wouldn't choose it.
I am not sure if the resources created by my pipeline are running indefinitely in Azure.
I spent some time today cleaning these out of our test account 🙂
Unfortunately I can't test these specific examples manually as I could only test the AKS Trusted Access relation creation for the BackupInstanceKubernetesCluster at my workplace - where my user is only able to create some specific resources which I have to manage.
I understand and I really appreciate your interest, but I prefer not to proceed without understanding why the extra resources are being created and seeing if we can resolve this situation.
Hi @turkenf,
Thanks a lot for the additional insights - randomizing the workspace name makes sense with your explanation.
I spent some time today cleaning these out of our test account 🙂
Sorry about this one, I won't run the pipeline until I figure this out manually. I thought after 6 hours, when the pipeline ends (times out?), the resources are somehow force deleted.
... I prefer not to proceed without understanding why the extra resources are being created and seeing if we can resolve this situation.
Understood - I'll spend a bit more time testing manually on why this happens and how to prevent it. The screenshot you added already helps a lot in this.
👍
/test-examples="examples/alertsmanagement/v1beta1/monitorsmartdetectoralertrule.yaml"
/test-examples="examples/authorization/v1beta1/trustedaccessrolebinding.yaml"
Hi @mergenci and @turkenf - Thanks a lot for the hints - the problem was some missing resources for the Application Insights, which get created by default if not defined in crossplane. Of course when not defined crossplane can't request their deletion. It's fixed now, uptest pipeline is green.
Thank you for your contribution and persevering on this one @drew0ps
Description of your changes
Adds azurerm_kubernetes_cluster_trusted_access_role_binding to the authorization provider.
Fixes #
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.Notes
2 notable things about this PR:
How has this code been tested