Closed markphillips100 closed 6 months ago
@markphillips100, could you please provide example MRs to reproduce the bug?
@turkenf I'll try and come up with something today.
Here's a sample that will create a resource group and a notification hub namespace successfully but fail to create the notificationhub. NOTE: replace workload-identity-provider-config
with the name of a provider with access to Azure before applying to cluster please.
The sample references an arbitrary, non-existent secret for the purpose of the test. If you comment the gcmCredential
property entirely then the hub is created so the error only occurs when specified and the secret does not exist yet.
---
apiVersion: azure.upbound.io/v1beta1
kind: ResourceGroup
metadata:
annotations:
crossplane.io/composition-resource-name: notification-my-sample-xyz-resourcegroup
crossplane.io/external-name: rg-xyz-notification-my-sample
name: notification-my-sample-xyz-resourcegroup
spec:
deletionPolicy: Delete
forProvider:
location: australiaeast
providerConfigRef:
name: workload-identity-provider-config
---
apiVersion: notificationhubs.azure.upbound.io/v1beta1
kind: NotificationHubNamespace
metadata:
annotations:
crossplane.io/composition-resource-name: notification-my-sample-xyz-notificationhubnamespace
crossplane.io/external-name: nhns-xyz-notification-my-sample
name: notification-my-sample-xyz-notificationhubnamespace
spec:
deletionPolicy: Delete
forProvider:
enabled: true
location: australiaeast
namespaceType: NotificationHub
resourceGroupNameRef:
name: notification-my-sample-xyz-resourcegroup
skuName: Free
providerConfigRef:
name: workload-identity-provider-config
---
apiVersion: notificationhubs.azure.upbound.io/v1beta1
kind: NotificationHub
metadata:
annotations:
crossplane.io/composition-resource-name: notification-my-sample-xyz-notificationhub-mysamplehub
crossplane.io/external-name: nh-mysamplehub-xyz-notification-my-sample
name: notification-my-sample-xyz-notificationhub-mysamplehub
spec:
deletionPolicy: Delete
forProvider:
gcmCredential:
- apiKeySecretRef:
key: SomeKey
name: SomeSecret
namespace: xyz
location: australiaeast
namespaceNameRef:
name: notification-my-sample-xyz-notificationhubnamespace
resourceGroupNameRef:
name: notification-my-sample-xyz-resourcegroup
providerConfigRef:
name: workload-identity-provider-config
Thank you for quick response and clear information @markphillips100 🙌
I can also reproduce the bug with local make run
:
goroutine 95293 [running]:
github.com/hashicorp/terraform-provider-azurerm/internal/services/notificationhub.expandNotificationHubsGCMCredentials(...)
github.com/hashicorp/terraform-provider-azurerm@v1.44.1-0.20230519070112-155958d2cb08/internal/services/notificationhub/notification_hub_resource.go:376
github.com/hashicorp/terraform-provider-azurerm/internal/services/notificationhub.resourceNotificationHubCreateUpdate(0x10cbf6000?, {0x10cb5b5e0?, 0x1401dfc6900?})
github.com/hashicorp/terraform-provider-azurerm@v1.44.1-0.20230519070112-155958d2cb08/internal/services/notificationhub/notification_hub_resource.go:181 +0xd80
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).create(0x10e21afb0?, {0x10e21afb0?, 0x140020a6620?}, 0xd?, {0x10cb5b5e0?, 0x1401dfc6900?})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.30.0/helper/schema/resource.go:766 +0x134
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).Apply(0x14002393340, {0x10e21afb0, 0x140020a6620}, 0x0, 0x14018488480, {0x10cb5b5e0, 0x1401dfc6900})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.30.0/helper/schema/resource.go:909 +0x860
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create(0x1401a6f08c0, {0x10e21afb0, 0x140020a6620}, {0x10e2e88e0?, 0x14002b40a00})
github.com/crossplane/upjet@v1.3.0/pkg/controller/external_tfpluginsdk.go:579 +0xb4
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
github.com/crossplane/upjet@v1.3.0/pkg/controller/external_async_tfpluginsdk.go:149 +0x120
created by github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create in goroutine 92396
github.com/crossplane/upjet@v1.3.0/pkg/controller/external_async_tfpluginsdk.go:145 +0x14c
make: *** [run] Error 2
I've investigated the root cause of the panic and my findings are as follows:
nil
[]byte
to string
here in upjet. For non-existent references, we just ignore the not-found error and continue to set the reference target to an empty string.nil
value as a map[string]any
, and hence the panic: interface conversion: interface {} is nil, not map[string]interface {}
panic message. In future reports, it would be great to have the panic message in addition to the stack trace mentioned. We could address this issue by doing a custom diff configuration for the NotificationHub.notificationhubs
resource, which filters an empty api_key
and thus prevents the panic. Or we could address this in the underlying Terraform resource implementation by checking whether the type assertion has succeeded.
But I feel like if a secret reference has not been found we had better not set an empty string
value as the resolved value because:
string
value for a unresolved reference is not the same as leaving it as unresolved. The empty string could in theory be a valid resolved value (probably this case is very rare). Or worse, we are just bypassing Terraform resource's schema checks: Normally, if the referenced secret does not exist, it means that a required Terraform configuration value is not available, which should fail Terraform resource's schema validation. What happens here is that if the referenced secret does not exist, we still supply a value for the configuration argument, which is enough to bypass Terraform's current schema validation for the resource but still fails its assumptions (the assumption that the supplied value is a map[string]any
. api_key
. While this should prevent the panic that we are observing here, we will probably hit this issue elsewhere in other resources.I interpret the root cause as a reference to a missing secret should be interpreted as not supplying the target Terraform configuration argument. This would be safer in regards of the underlying Terraform resource's implementation. Then we can start questioning whether we should just return the not-found error here.
After some further digging, it turns out that we had decided to set an empty string
for unresolved references to make it possible to delete the referencing resource even if the referenced secret is missing. This points to a deeper dependency issue here. Instead of tackling with this dependency issue here and considering there will already be clients adapted to this behavior, I believe it just makes much more sense to address this issue in the context of the NotificationHub.notificationhubs
resource. This is because, as discussed above, a more cautious Terraform resource implementation with less assumptions on the type assertion could have prevented this.
Is there an existing issue for this?
Affected Resource(s)
notificationhubs.azure.upbound.io/v1beta1 - NotificationHub
Resource MRs required to reproduce the bug
No response
Steps to Reproduce
Create a notificationhub with a gcmCredential.apiKeySecretRef specified but where the secret does not exist yet.
What happened?
Provider crashes. Pod restarts with crashLoopBackoff.
Relevant Error Output Snippet
Crossplane Version
1.15.0
Provider Version
1.1.0
Kubernetes Version
Client: 1.29.1, Server: 1.28.0
Kubernetes Distribution
KinD
Additional Info
No response