clincha-org / clincha

Configuration and monitoring of clinch-home infrastructure
https://clinch-home.com
1 stars 1 forks source link

dial tcp: lookup login.microsoftonline.com: i/o timeout #100

Closed clincha closed 1 year ago

clincha commented 1 year ago

time="2023-06-22T20:24:48Z" level=error msg="Reconciler error" controller=deletebackuprequest controllerGroup=velero.io controllerKind=DeleteBackupRequest deleteBackupRequest="{\"name\":\"angus-h5bz2\",\"namespace\":\"velero\"}" error="error getting the backup store: rpc error: code = Unknown desc = azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/c6ff6270-64cf-40d6-ae87-e11cca58de61/resourceGroups/velero/providers/Microsoft.Storage/storageAccounts/clinchavelero/listKeys?%24expand=kerb&api-version=2019-06-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Post \"https://login.microsoftonline.com/5bdbf6b9-7155-49e5-a3ce-f265fd5ec77e/oauth2/token?api-version=1.0\": dial tcp: lookup login.microsoftonline.com: i/o timeout'" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_deletion_controller.go:246" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupDeletionReconciler).Reconcile" logSource="/go/pkg/mod/github.com/bombsimon/logrusr/v3@v3.0.0/logrusr.go:123" name=angus-h5bz2 namespace=velero reconcileID="\"79cb0bb5-7f78-416a-831e-5b4619301bef\""

clincha commented 1 year ago

I upgraded the version of the Azure container (to v1.7.0) and that seemed to sort out the i/o issues. However, there is now a new error:

Important bits:

error getting the backup store: rpc error: code = Unknown desc = DefaultAzureCredential: failed to acquire a token.
Attempted credentials:
    ClientSecretCredential: unable to resolve an endpoint: server response error: context deadline exceeded

Full error:

time="2023-06-23T17:15:38Z" level=error msg="Error getting backup store for this location" backupLocation=velero/velero controller=backup-sync error="rpc error: code = Unknown desc = DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tClientSecretCredential: unable to resolve an endpoint: server response error:\n context deadline exceeded" logSource="pkg/controller/backup_sync_controller.go:100"
time="2023-06-23T17:15:38Z" level=error msg="Error getting a backup store" backup-storage-location=velero/velero controller=backup-storage-location error="rpc error: code = Unknown desc = DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tClientSecretCredential: unable to resolve an endpoint: server response error:\n context deadline exceeded" logSource="pkg/controller/backup_storage_location_controller.go:148"
time="2023-06-23T17:15:38Z" level=info msg="BackupStorageLocation is invalid, marking as unavailable" backup-storage-location=velero/velero controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:131"
time="2023-06-23T17:15:38Z" level=error msg="Reconciler error" controller=deletebackuprequest controllerGroup=velero.io controllerKind=DeleteBackupRequest deleteBackupRequest="{\"name\":\"angus-h5bz2\",\"namespace\":\"velero\"}" error="error getting the backup store: rpc error: code = Unknown desc = DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tClientSecretCredential: unable to resolve an endpoint: server response error:\n context deadline exceeded" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_deletion_controller.go:246" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*backupDeletionReconciler).Reconcile" logSource="/go/pkg/mod/github.com/bombsimon/logrusr/v3@v3.0.0/logrusr.go:123" name=angus-h5bz2 namespace=velero reconcileID="\"fd023810-25f3-4489-915c-0ae7bf7888e8\""
clincha commented 1 year ago

Tried to install using the Velero CLI instead of Helm and hit the same issue.

clincha commented 1 year ago

Looks like it might be a permissions thing.

clincha commented 1 year ago

Finally fixed this. As always, it was DNS! #102

Now that that's been fixed I've hit this error which is at least an error message from Azure.

time="2023-07-01T16:14:52Z" level=error msg="Error getting a backup store" backup-storage-location=velero/velero controller=backup-storage-location error="rpc error: code = Unknown desc = DefaultAzureCredential authentication failed\nPOST https://login.microsoftonline.com/5bdbf6b9-7155-49e5-a3ce-f265fd5ec77e/oauth2/v2.0/token\n--------------------------------------------------------------------------------\nRESPONSE 400 Bad Request\n--------------------------------------------------------------------------------\n{\n  \"error\": \"unauthorized_client\",\n  \"error_description\": \"AADSTS700016: Application with identifier '8a84109a-b36b-4d5f-a8c4-f87cf82e5bc6' was not found in the directory 'Default Directory'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant.\\r\\nTrace ID: e08cb5b8-ba01-4bcc-9768-3f98e42a0600\\r\\nCorrelation ID: 2e82461a-a315-433e-b10a-5a590e9eb569\\r\\nTimestamp: 2023-07-01 16:14:52Z\",\n  \"error_codes\": [\n    700016\n  ],\n  \"timestamp\": \"2023-07-01 16:14:52Z\",\n  \"trace_id\": \"e08cb5b8-ba01-4bcc-9768-3f98e42a0600\",\n  \"correlation_id\": \"2e82461a-a315-433e-b10a-5a590e9eb569\",\n  \"error_uri\": \"https://login.microsoftonline.com/error?code=700016\"\n}\n--------------------------------------------------------------------------------\n" logSource="pkg/controller/backup_storage_location_controller.go:148"