confluentinc / terraform-provider-confluent

Terraform Provider for Confluent
Apache License 2.0
27 stars 63 forks source link

Terraform getting error reading resources with URLs containing !A(MISSING) #425

Open penicaudm opened 2 weeks ago

penicaudm commented 2 weeks ago

For the last few days we've been getting this issue on resources such as kafka ACL, topic or identity pools:

│ Error: error reading Kafka ACLs: Get "https://lkc-redacted.eastus.azure.glb.confluent.cloud:443/kafka/v3/clusters/lkc-nwmndk/acls?host=%!A(MISSING)&operation=READ&pattern_type=LITERAL&permission=ALLOW&principal=User%!A(MISSING)pool-v6zE&resource_name=%!A(MISSING)&resource_type=GROUP": GET https://lkc-redacted.eastus.azure.glb.confluent.cloud:443/kafka/v3/clusters/lkc-nwmndk/acls?host=%!A(MISSING)&operation=READ&pattern_type=LITERAL&permission=ALLOW&principal=User%!A(MISSING)pool-v6zE&resource_name=%!A(MISSING)&resource_type=GROUP giving up after 5 attempt(s)

I'm wondering if the "User%!A(MISSING)" in the URL is causing an issue with the API but I can't be certain

This is only happening for one of our 12 clusters, which makes it even weirder.

So far I tried: reverting the provider to 1.83, we got the exact same error.

linouk23 commented 2 weeks ago

Thanks for creating this issue @penicaudm!

We have seen this issue reported by our users before. Could you check whether this comment helps to find the root cause? It seems like this issue can be caused by different factors.

linouk23 commented 2 weeks ago

@penicaudm could you also verify whether a direct API call from the same host works?

penicaudm commented 2 weeks ago

We're using az devops pipelines running on a virtual machine scale set. Our VMs running pipelines usually run for the day and get created at night. I just cleaned up our entire pool of VMs to create fresh ones and now the pipeline is working ok again (at least on the plan command). So the only thing that really changed is the terraform installation

I'm trying to figure out if anything has changed but its the same terraform version, same provider, same tasks,etc.

penicaudm commented 2 weeks ago

So we got the error again. I reran the pipeline and it worked, pretty hard to troubleshoot that..

I'm suspecting the go issue you reported previously with these topics:

Is at play. So it could be in the fmt package or in Terraform directly.. I'm not good enough at go to provide an educated explanation.

penicaudm commented 2 weeks ago

Here is a sample of the formatting issue if needed:

provider.terraform-provider-confluent_2.1.0: Error reading Kafka ACLs "cluster/TOPIC#global.it4it.payload-message.payload-store-error-europe.v1#LITERAL#User:pool-xlg5#*#READ#ALLOW" this resulted in the following URL:

https://cluster-g0yq0p.westeurope.azure.glb.confluent.cloud:443/kafka/v3/clusters/cluster/acls?host=%!A(MISSING)&operation=READ&pattern_type=LITERAL&permission=ALLOW&principal=User%!A(MISSING)pool-xlg5&resource_name=global.it4it.payload-message.payload-store-error-europe.v1&resource_type=TOPIC

So I'm not sure but perhaps some formatting or testing could be needed around this kind of function:

https://github.com/confluentinc/terraform-provider-confluent/blob/d22889b6ebc65b4fa6cdde4ab6ce03de73366b42/internal/provider/resource_kafka_acl.go#L230C1-L232C2