Mongey / terraform-provider-confluentcloud

A Terraform provider for managing resource in confluent.cloud
MIT License
110 stars 47 forks source link

Second apply or plan of confluentcloud_schema_registry crashes the provider #179

Closed lorelei-rupp-imprivata closed 2 years ago

lorelei-rupp-imprivata commented 2 years ago

Starting last friday code that has worked for months started to fail and crash this provider

this is the error we see with both TF 0.15.5 and TF 1.0.11 and the latest 0.0.14 of this provider

╷
│ Error: Plugin did not respond
│ 
│   with confluentcloud_schema_registry.the_registry,
│   on main.tf line 78, in resource "confluentcloud_schema_registry" "the_registry":
│   78: resource "confluentcloud_schema_registry" "the_registry" {
│ 
│ The plugin encountered an error, and failed to respond to the
│ plugin.(*GRPCProvider).ReadResource call. The plugin logs may contain more
│ details.
╵
Releasing state lock. This may take a few moments...

Stack trace from the terraform-provider-confluentcloud_v0.0.14 plugin:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xcb4096]

goroutine 88 [running]:
github.com/Mongey/terraform-provider-confluentcloud/ccloud.schemaRegistryRead(0xfd1068, 0xc000454c60, 0xc00046e500, 0xe80560, 0xc00032b630, 0xc00044ff90, 0xc0000c7878, 0x40e0f8)
    /home/runner/work/terraform-provider-confluentcloud/terraform-provider-confluentcloud/ccloud/resource_schema_registry.go:90 +0x2b6
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).read(0xc0001b4a80, 0xfd0ff8, 0xc000449740, 0xc00046e500, 0xe80560, 0xc00032b630, 0x0, 0x0, 0x0)
    /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.10.0/helper/schema/resource.go:358 +0x17f
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc0001b4a80, 0xfd0ff8, 0xc000449740, 0xc00045a4e0, 0xe80560, 0xc00032b630, 0xc000137520, 0x0, 0x0, 0x0)
    /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.10.0/helper/schema/resource.go:635 +0x1cb
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ReadResource(0xc00012b488, 0xfd0ff8, 0xc000449740, 0xc000449780, 0xea7fcd, 0x12, 0x0)
    /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.10.0/helper/schema/grpc_provider.go:576 +0x47d
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ReadResource(0xc000192500, 0xfd10a0, 0xc000449740, 0xc0004545a0, 0x0, 0x0, 0x0)
    /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-go@v0.5.0/tfprotov5/tf5server/server.go:553 +0x322
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ReadResource_Handler(0xe5a8e0, 0xc000192500, 0xfd10a0, 0xc00044c8a0, 0xc000454540, 0x0, 0xfd10a0, 0xc00044c8a0, 0xc000386180, 0xb5)
    /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-go@v0.5.0/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:344 +0x214
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0002c8a80, 0xfda9b8, 0xc0005e0900, 0xc0001a7c00, 0xc000235a40, 0x14ed830, 0x0, 0x0, 0x0)
    /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1210 +0x52b
google.golang.org/grpc.(*Server).handleStream(0xc0002c8a80, 0xfda9b8, 0xc0005e0900, 0xc0001a7c00, 0x0)
    /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1533 +0xd0c
google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0002fa130, 0xc0002c8a80, 0xfda9b8, 0xc0005e0900, 0xc0001a7c00)
    /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:871 +0xab
created by google.golang.org/grpc.(*Server).serveStreams.func1
    /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:869 +0x1fd

Error: The terraform-provider-confluentcloud_v0.0.14 plugin crashed!

Essentially we run an apply to stand up our confluent resources including the registry Then we run a apply again or a plan again, and it crashes

In our module it just does a


resource "confluentcloud_environment" "the_env" {
  name = var.kafka_cluster_name
}

resource "confluentcloud_kafka_cluster" "the_cluster" {
  depends_on = [confluentcloud_environment.the_env]

  name             = var.kafka_cluster_name
  service_provider = "aws"
  region           = var.kafka_cluster_region
  availability     = "LOW"

  environment_id = confluentcloud_environment.the_env.id
  deployment = {
    sku = "STANDARD"
  }
  #You must set these or it will want to destroy/recreate every time
  network_egress  = var.kafka_cluster_network_egress
  network_ingress = var.kafka_cluster_network_ingress
  storage         = var.kafka_cluster_storage
}

resource "confluentcloud_schema_registry" "the_registry" {
  environment_id   = confluentcloud_environment.the_env.id
  service_provider = "aws"
  region           = var.registry_region

  # Requires at least one kafka cluster to enable the schema registry in the environment.
  depends_on = [confluentcloud_kafka_cluster.the_cluster]
}

Any ideas would be greatly appreciated

lorelei-rupp-imprivata commented 2 years ago

Basically its any new env we stand up has this issue. Older envs do not Maybe something has changed on Confluent side?

I also see when using the confluent CLI to describe the schema registry, it does find it but its oddly named

| Name | Stream Governance Package

lorelei-rupp-imprivata commented 2 years ago

Also there appears to have been a new confluent cloud update on friday, the day this started to happen https://docs.confluent.io/cloud/current/release-notes/index.html#july-15-2022 but the release notes don't mention anything that sounds remotely related

afoley-st commented 2 years ago

Just noting that I see the same thing

The plugin.(*GRPCProvider).UpgradeResourceState request was cancelled.

Error: Request cancelled

  with confluentcloud_connector.salesforce_sobject_connectors[0],
  on connectors.tf line 37, in resource \"confluentcloud_connector\" \"salesforce_sobject_connectors\":
  37: resource \"confluentcloud_connector\" \"salesforce_sobject_connectors\" {

The plugin.(*GRPCProvider).UpgradeResourceState request was cancelled.

Error: Request cancelled

  with confluentcloud_schema_registry.registry,
  on schema_registries.tf line 1, in resource \"confluentcloud_schema_registry\" \"registry\":
   1: resource \"confluentcloud_schema_registry\" \"registry\" {

The plugin.(*GRPCProvider).ReadResource request was cancelled.

Stack trace from the terraform-provider-confluentcloud_v0.0.14 plugin:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xcb4096]

goroutine 20 [running]:
github.com/Mongey/terraform-provider-confluentcloud/ccloud.schemaRegistryRead(0xfd1068, 0xc000592960, 0xc0000c3880, 0xe80560, 0xc00007f0e0, 0xc000401270, 0xc000013878, 0x40e0f8)
        /home/runner/work/terraform-provider-confluentcloud/terraform-provider-confluentcloud/ccloud/resource_schema_registry.go:90 +0x2b6
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).read(0xc00018aa80, 0xfd0ff8, 0xc000202840, 0xc0000c3880, 0xe80560, 0xc00007f0e0, 0x0, 0x0, 0x0)
        /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.10.0/helper/schema/resource.go:358 +0x17f
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc00018aa80, 0xfd0ff8, 0xc000202840, 0xc000218a90, 0xe80560, 0xc00007f0e0, 0xc000508190, 0x0, 0x0, 0x0)
        /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.10.0/helper/schema/resource.go:635 +0x1cb
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ReadResource(0xc00000d470, 0xfd0ff8, 0xc000202840, 0xc000202880, 0xea7fcd, 0x12, 0x0)
        /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.10.0/helper/schema/grpc_provider.go:576 +0x47d
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ReadResource(0xc0000c2280, 0xfd10a0, 0xc000202840, 0xc000592180, 0x0, 0x0, 0x0)
        /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-go@v0.5.0/tfprotov5/tf5server/server.go:553 +0x322
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ReadResource_Handler(0xe5a8e0, 0xc0000c2280, 0xfd10a0, 0xc00043e1b0, 0xc000592120, 0x0, 0xfd10a0, 0xc00043e1b0, 0xc00003c480, 0xb7)
        /home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-go@v0.5.0/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:344 +0x214
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000298a80, 0xfda9b8, 0xc00037c300, 0xc000190000, 0xc0000a66f0, 0x14ed830, 0x0, 0x0, 0x0)
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1210 +0x52b
google.golang.org/grpc.(*Server).handleStream(0xc000298a80, 0xfda9b8, 0xc00037c300, 0xc000190000, 0x0)
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1533 +0xd0c
google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0000ae160, 0xc000298a80, 0xfda9b8, 0xc00037c300, 0xc000190000)
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:871 +0xab
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:869 +0x1fd

Error: The terraform-provider-confluentcloud_v0.0.14 plugin crashed!

This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.
lorelei-rupp-imprivata commented 2 years ago

@Mongey Not sure if you are aware of this, but it looks like the latest release of confluent cloud might have broke the provider.

afoley-st commented 2 years ago

Current workaround is to use external data sources with confluent cli, but its not preferred

data "external" "schema_registry" {
  program = [
    "confluent",
    "schema-registry",
    "--environment", 
    <environmentid>, 
    "cluster", 
    "enable", 
    "--cloud",
    "gcp", 
    "--geo",
    "us",
    "--output",
    "json"
  ]
}
lorelei-rupp-imprivata commented 2 years ago

Confluent doesn't seem to want to help us with this issue either because this is not their provider. :(

lorelei-rupp-imprivata commented 2 years ago

@afoley-st if you use confluent cli to get the registry that is created do you see it has a weird name "Stream Governance Package" Older environments we set up months ago have the name from here https://github.com/cgroschupp/go-client-confluent-cloud/blob/master/confluentcloud/schema_registry.go#L10

Not sure why now this name is not honored. Maybe Confluent changed something and the go interface needs to be updated?

afoley-st commented 2 years ago

@lorelei-rupp-imprivata see below:

{
  "name": "Stream Governance Package",
  "cluster_id": "lsrc-...",
  "endpoint_url": "...",
  "used_schemas": "1",
  "available_schemas": "999",
  "global_compatibility": "\u003cRequires API Key\u003e",
  "mode": "\u003cRequires API Key\u003e",
  "service_provider": ""
}

but you can just enable the registry multiple times (thats what this provider does I think) using the above terraform datasource. You can then reference the endpoint such as data.external.schema_registry.result.endpoint_url.

lorelei-rupp-imprivata commented 2 years ago

@afoley-st so I fail on a plan, refreshing the schema registry resource. I cannot remove that resource or replace it with a external right now, it would destory it if I didn't do some fancy terraform I think.. my json output looks similar to yours

afoley-st commented 2 years ago

What worked for me:

I did a terraform state rm of the broken schema registry first. Then I ran a plan/apply for the schema registry. THIS IS WHAT WORKED FOR ME, BUT STATE MANAGEMENT MAY HAVE SIDE EFFECTS so please proceed safely :)

lorelei-rupp-imprivata commented 2 years ago

hahah yeah no i don't want to do that. I get it though. Hoping someone else can shed some light on this. Something def changed to break us :(

lorelei-rupp-imprivata commented 2 years ago

@afoley-st so confluent refactored the apis for the schema registry 6 days ago https://github.com/confluentinc/schema-registry/pull/2332/files This is likely the culprit. The GO library likely needs to be updated and this provider to consume it

lorelei-rupp-imprivata commented 2 years ago

Current workaround is to use external data sources with confluent cli, but its not preferred

data "external" "schema_registry" {
  program = [
    "confluent",
    "schema-registry",
    "--environment", 
    <environmentid>, 
    "cluster", 
    "enable", 
    "--cloud",
    "gcp", 
    "--geo",
    "us",
    "--output",
    "json"
  ]
}

How do you keep confluent logged in with this block. IE how does it have the env vars necessary for confluent login?

afoley-st commented 2 years ago

In our case, we added a preflight with ansible to do a confluent login --save

afoley-st commented 2 years ago

I assume you could do the same with another external datasource and a depends_on clause.

lorelei-rupp-imprivata commented 2 years ago

Yeah the problem with that is then the external wants to run every plan, it shows drift, which in turn causes things that use its output to "force replace". I need to figure out how to set these env vars some how perhaps. Its not recommended to use depends on with the data external clause

lorelei-rupp-imprivata commented 2 years ago

I wish I could pass an environment variable block to the data external, that would solve it

afoley-st commented 2 years ago

you can with a null_resource:

resource "null_resource" "run_login" {
  triggers = {
    always_run = timestamp()
  }
  provisioner "local-exec" {
    command = "confluent login --save"
    environment = {
        SOME_FOO = var.SOME_BAR
    }
  }
}
afoley-st commented 2 years ago

and then set CONFLUENT_CLOUD_EMAIL and CONFLUENT_CLOUD_PASSWORD in the env block

lorelei-rupp-imprivata commented 2 years ago

you can with a null_resource:

resource "null_resource" "run_login" {
  triggers = {
    always_run = timestamp()
  }
  provisioner "local-exec" {
    command = "confluent login --save"
    environment = {
        SOME_FOO = var.SOME_BAR
    }
  }
}

Yeah I had tried this but when an external "depends on" a null resource like this it actually call makes there a forever diff on both of these on every apply, which is OK, however things that "use the output of the data external then are forced replaced. Essentially every time I run, it creates new API keys for me, becuase the registry api keys depend on the external

Im gonna play around with it though, im sure there is some trick I can use

Im basically switching fully over to the official confluent provider, since they only support that, with a null resource to create/get the schema registry and then only using this provider for creating the schema registry api keys

abhishekkh commented 2 years ago

Running into same error, removed schema-registry from remote state for now. Will self manage it until this is fixed

Mongey commented 2 years ago

👋 Hey all I've released v0.0.15 which should have a fix for this