crossplane-contrib / provider-upjet-azure

Azure Provider for Crossplane.
https://marketplace.upbound.io/providers/upbound/provider-family-azure/
Apache License 2.0
65 stars 76 forks source link

[Bug]: notificationhubs crashes provider when gcmCredential secret not available #730

Closed markphillips100 closed 6 months ago

markphillips100 commented 7 months ago

Is there an existing issue for this?

Affected Resource(s)

notificationhubs.azure.upbound.io/v1beta1 - NotificationHub

Resource MRs required to reproduce the bug

No response

Steps to Reproduce

Create a notificationhub with a gcmCredential.apiKeySecretRef specified but where the secret does not exist yet.

What happened?

Provider crashes. Pod restarts with crashLoopBackoff.

Relevant Error Output Snippet

goroutine 10367 [running]:
github.com/hashicorp/terraform-provider-azurerm/internal/services/notificationhub.expandNotificationHubsGCMCredentials(...)
    github.com/hashicorp/terraform-provider-azurerm@v1.44.1-0.20230519070112-155958d2cb08/internal/services/notificationhub/notification_hub_resource.go:376
github.com/hashicorp/terraform-provider-azurerm/internal/services/notificationhub.resourceNotificationHubCreateUpdate(0xb40dc00?, {0xa261ba0?, 0xc002d43200?})
    github.com/hashicorp/terraform-provider-azurerm@v1.44.1-0.20230519070112-155958d2cb08/internal/services/notificationhub/notification_hub_resource.go:181 +0x1185
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).create(0xc76e6f0?, {0xc76e6f0?, 0xc00048f9d0?}, 0xd?, {0xa261ba0?, 0xc002d43200?})
    github.com/hashicorp/terraform-plugin-sdk/v2@v2.30.0/helper/schema/resource.go:766 +0x163
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).Apply(0xc0023c7ce0, {0xc76e6f0, 0xc00048f9d0}, 0x0, 0xc00187c500, {0xa261ba0, 0xc002d43200})
    github.com/hashicorp/terraform-plugin-sdk/v2@v2.30.0/helper/schema/resource.go:909 +0xa89
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create(0xc0018d8380, {0xc76e6f0, 0xc00048f9d0}, {0xc7fa620?, 0xc003709900})
    github.com/crossplane/upjet@v1.3.0/pkg/controller/external_tfpluginsdk.go:579 +0xe7
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
    github.com/crossplane/upjet@v1.3.0/pkg/controller/external_async_tfpluginsdk.go:149 +0x145
created by github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create in goroutine 4488
    github.com/crossplane/upjet@v1.3.0/pkg/controller/external_async_tfpluginsdk.go:145 +0x168

Crossplane Version

1.15.0

Provider Version

1.1.0

Kubernetes Version

Client: 1.29.1, Server: 1.28.0

Kubernetes Distribution

KinD

Additional Info

No response

turkenf commented 7 months ago

@markphillips100, could you please provide example MRs to reproduce the bug?

markphillips100 commented 6 months ago

@turkenf I'll try and come up with something today.

markphillips100 commented 6 months ago

Here's a sample that will create a resource group and a notification hub namespace successfully but fail to create the notificationhub. NOTE: replace workload-identity-provider-config with the name of a provider with access to Azure before applying to cluster please.

The sample references an arbitrary, non-existent secret for the purpose of the test. If you comment the gcmCredential property entirely then the hub is created so the error only occurs when specified and the secret does not exist yet.

---
apiVersion: azure.upbound.io/v1beta1
kind: ResourceGroup
metadata:
  annotations:
    crossplane.io/composition-resource-name: notification-my-sample-xyz-resourcegroup
    crossplane.io/external-name: rg-xyz-notification-my-sample
  name: notification-my-sample-xyz-resourcegroup
spec:
  deletionPolicy: Delete
  forProvider:
    location: australiaeast
  providerConfigRef:
    name: workload-identity-provider-config
---
apiVersion: notificationhubs.azure.upbound.io/v1beta1
kind: NotificationHubNamespace
metadata:
  annotations:
    crossplane.io/composition-resource-name: notification-my-sample-xyz-notificationhubnamespace
    crossplane.io/external-name: nhns-xyz-notification-my-sample
  name: notification-my-sample-xyz-notificationhubnamespace
spec:
  deletionPolicy: Delete
  forProvider:
    enabled: true
    location: australiaeast
    namespaceType: NotificationHub
    resourceGroupNameRef:
      name: notification-my-sample-xyz-resourcegroup
    skuName: Free
  providerConfigRef:
    name: workload-identity-provider-config
---
apiVersion: notificationhubs.azure.upbound.io/v1beta1
kind: NotificationHub
metadata:
  annotations:
    crossplane.io/composition-resource-name: notification-my-sample-xyz-notificationhub-mysamplehub
    crossplane.io/external-name: nh-mysamplehub-xyz-notification-my-sample
  name: notification-my-sample-xyz-notificationhub-mysamplehub
spec:
  deletionPolicy: Delete
  forProvider:
    gcmCredential:
    - apiKeySecretRef:
        key: SomeKey
        name: SomeSecret
        namespace: xyz
    location: australiaeast
    namespaceNameRef:
      name: notification-my-sample-xyz-notificationhubnamespace
    resourceGroupNameRef:
      name: notification-my-sample-xyz-resourcegroup
  providerConfigRef:
    name: workload-identity-provider-config
turkenf commented 6 months ago

Thank you for quick response and clear information @markphillips100 🙌

I can also reproduce the bug with local make run:

goroutine 95293 [running]:
github.com/hashicorp/terraform-provider-azurerm/internal/services/notificationhub.expandNotificationHubsGCMCredentials(...)
    github.com/hashicorp/terraform-provider-azurerm@v1.44.1-0.20230519070112-155958d2cb08/internal/services/notificationhub/notification_hub_resource.go:376
github.com/hashicorp/terraform-provider-azurerm/internal/services/notificationhub.resourceNotificationHubCreateUpdate(0x10cbf6000?, {0x10cb5b5e0?, 0x1401dfc6900?})
    github.com/hashicorp/terraform-provider-azurerm@v1.44.1-0.20230519070112-155958d2cb08/internal/services/notificationhub/notification_hub_resource.go:181 +0xd80
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).create(0x10e21afb0?, {0x10e21afb0?, 0x140020a6620?}, 0xd?, {0x10cb5b5e0?, 0x1401dfc6900?})
    github.com/hashicorp/terraform-plugin-sdk/v2@v2.30.0/helper/schema/resource.go:766 +0x134
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).Apply(0x14002393340, {0x10e21afb0, 0x140020a6620}, 0x0, 0x14018488480, {0x10cb5b5e0, 0x1401dfc6900})
    github.com/hashicorp/terraform-plugin-sdk/v2@v2.30.0/helper/schema/resource.go:909 +0x860
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create(0x1401a6f08c0, {0x10e21afb0, 0x140020a6620}, {0x10e2e88e0?, 0x14002b40a00})
    github.com/crossplane/upjet@v1.3.0/pkg/controller/external_tfpluginsdk.go:579 +0xb4
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
    github.com/crossplane/upjet@v1.3.0/pkg/controller/external_async_tfpluginsdk.go:149 +0x120
created by github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create in goroutine 92396
    github.com/crossplane/upjet@v1.3.0/pkg/controller/external_async_tfpluginsdk.go:145 +0x14c
make: *** [run] Error 2
ulucinar commented 6 months ago

I've investigated the root cause of the panic and my findings are as follows:

We could address this issue by doing a custom diff configuration for the NotificationHub.notificationhubs resource, which filters an empty api_key and thus prevents the panic. Or we could address this in the underlying Terraform resource implementation by checking whether the type assertion has succeeded.

But I feel like if a secret reference has not been found we had better not set an empty string value as the resolved value because:

I interpret the root cause as a reference to a missing secret should be interpreted as not supplying the target Terraform configuration argument. This would be safer in regards of the underlying Terraform resource's implementation. Then we can start questioning whether we should just return the not-found error here.

After some further digging, it turns out that we had decided to set an empty string for unresolved references to make it possible to delete the referencing resource even if the referenced secret is missing. This points to a deeper dependency issue here. Instead of tackling with this dependency issue here and considering there will already be clients adapted to this behavior, I believe it just makes much more sense to address this issue in the context of the NotificationHub.notificationhubs resource. This is because, as discussed above, a more cautious Terraform resource implementation with less assumptions on the type assertion could have prevented this.