Closed bogdaniordache closed 2 years ago
@bogdaniordache thanks for opening an issue!
We are trying to upgrade the provider from version 0.2.0, because we are getting the Error: 429 Too Many Requests, to the 0.4.0 version.
That's a great idea 👍
Could you confirm you're following our upgrade guide?
If yes, we might need be interested to take a look at the redacted output from every command of that guide that you could send to this email so we could investigate.
On apply we get the Plan: 15 to add, 0 to change, 0 to destroy.
That is definitely very weird. I'm wondering whether you got this message when running terraform plan
using 0.2.0
or 0.4.0
.
Until we figure this out, one easy fix could be to import these 15 topics.
Could you confirm you're following our upgrade guide?
Yes, I am following the guide, this is related to topics, and with a fresh state at every switch of provider version. Clusters remain the same, isolating just to topic generation.
On apply we get the Plan: 15 to add, 0 to change, 0 to destroy.
That is definitely very weird. I'm wondering whether you got this message when running
terraform plan
using0.2.0
or0.4.0
.
The plans were executed on 0.4.0
, running plan, apply and destroy(with -parallelism=2
) work fine in 0.2.0
.
Until we figure this out, one easy fix could be to import these 15 topics.
Can be a solution, but not a reliable one, especially within larger deployments.
I see the same issue when creating topics with the confluent provider. It seems to appear about 65% of the time with clean terraform apply
's. It seems we both do things in a for_each
block, im not sure if that has any impact.
to follow up on the above - If i add a long sleep (we create an API key with automation), then it seems to work. I think it stems from that fact that keys are not immediately active. Heres my terraform:
# You need to wait for a large amount of time until key is active, unfortunately
resource "time_sleep" "wait_600_seconds" {
# An automated API Key for the new cluster
depends_on = [module.confluent_cluster.cluster_service_account_api_key]
create_duration = "600s"
}
resource "confluentcloud_kafka_topic" "topics" {
depends_on = [time_sleep.wait_600_seconds]
for_each = local.topics
kafka_cluster = module.confluent_cluster.cluster_id
topic_name = each.key
partitions_count = each.value.partitions
http_endpoint = module.confluent_cluster.cluster_http_endpoint
credentials {
key = module.confluent_cluster.cluster_service_account_api_key
secret = module.confluent_cluster.cluster_service_account_api_secret
}
}
I hope this helps someone!
Edit: It also helped for me to scale down the parallelism in terraform.
Having the same issue (topic creation succeeds, but terraform state doesn't show all of them), below are some (sanitized) logs from the trace for one topic.
2022-02-23T12:02:02.642+0100 [INFO] provider.terraform-provider-confluentcloud_0.4.0: 2022/02/23 12:02:02 [DEBUG] Created Kafka topic <CLUSTERID>/<TOPICNAME>: timestamp=2022-02-23T12:02:02.642+0100
2022-02-23T12:02:02.642+0100 [INFO] provider.terraform-provider-confluentcloud_0.4.0: 2022/02/23 12:02:02 [INFO] Kafka topic read for <CLUSTERID>/<TOPICNAME>: timestamp=2022-02-23T12:02:02.642+0100
2022-02-23T12:02:02.642+0100 [DEBUG] provider.terraform-provider-confluentcloud_0.4.0: 2022/02/23 12:02:02 [DEBUG] GET https://<ENDPOINT>.aws.confluent.cloud:443/kafka/v3/clusters/<CLUSTERID>/topics/<TOPICNAME>
2022-02-23T12:02:02.881+0100 [INFO] provider.terraform-provider-confluentcloud_0.4.0: 2022/02/23 12:02:02 [WARN] Kafka topic get failed for id <TOPICNAME>, &{404 Not Found 404 HTTP/2.0 2 0 map[Content-Type:[application/json] Date:[Wed, 23 Feb 2022 11:02:02 GMT]] {0x14000b63760} -1 [] false false map[] 0x14000174a00 0x140000ad3f0}, 404 Not Found: timestamp=2022-02-23T12:02:02.881+0100
2022-02-23T12:02:02.881+0100 [INFO] provider.terraform-provider-confluentcloud_0.4.0: 2022/02/23 12:02:02 [WARN] Kafka topic with id=<CLUSTERID>/<TOPICNAME> is not found: timestamp=2022-02-23T12:02:02.881+0100
2022-02-23T12:02:02.882+0100 [TRACE] maybeTainted: module.applications_kafka_topics.confluentcloud_kafka_topic.kafka_topics["<TOPICNAME>"] encountered an error during creation, so it is now marked as tainted
Think the issue is that this part is no longer checking if there is an response (maybe the topic is still being created at that point?) and just assuming the creation failed even though it did actually succeed.
@charlottemach @afoley-st @bogdaniordache thanks for doing the investigation, that's very insightful!
Looks like we need to add a timeout after POST
(create a topic) and before GET
(read a topic) requests.
to follow up on the above - If i add a long sleep (we create an API key with automation), then it seems to work. I think it stems from that fact that keys are not immediately active. Heres my terraform:
Thanks for sharing your experience! FWIW if the Kafka API Key is not active you should be getting 401
so it might be a bit different.
Update: we're hoping to release 0.5.0
on Friday or next week that will include a fix for it 🤞:
30 seconds sleep after POST (create a topic) and before GET (read a topic) requests.
That said, we found it difficult to reproduce the issue: namely, we were able to create 200 topics for a Standard Kafka cluster on AWS (TF logs) so we were wondering if it might be connected to other factors such as cluster types / cloud provider. Could you confirm what type of cluster / cloud provider are you using? For example,
@charlottemach @afoley-st @bogdaniordache
Issue was found while trying to setup topics on an AWS Basic Kafka cluster.
Update: I managed to reproduce the issue for a Basic cluster for 0.4.0
😮 (and Confluent Cloud Console does show all 100 topics), will try to create 100 topics for a Basic cluster using an unreleased 0.5.0
version of TF Provider for Confluent Cloud 🤞 now.
Another update:
Great news: the unreleased 0.5.0
version managed to create 200 topics for the same Basic cluster without printing out 404
or any other errors:
➜ test2 git:(master) ✗ terraform apply
...
Apply complete! Resources: 198 added, 0 changed, 0 destroyed.
➜ test2 git:(master) ✗ terraform plan
...
confluentcloud_kafka_topic.orders_177: Refreshing state... [id=lkc-q21x0p/orders_177]
confluentcloud_kafka_topic.orders_37: Refreshing state... [id=lkc-q21x0p/orders_37]
...
confluentcloud_kafka_topic.orders_113: Refreshing state... [id=lkc-q21x0p/orders_113]
confluentcloud_kafka_topic.orders_168: Refreshing state... [id=lkc-q21x0p/orders_168]
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
We are facing this issue with Standard cluster where we get either 404 error or terraform root resource missing error. We started with an attempt to create 200+ topics but this never worked. We even tried 10 topics and still same issue. It only works with 3 topics for standard cluster using terraform 0.4.0.
@AAhmed84 could you open a separate issue for 400
error (or it's a typo and you meant to write 404
?)? I think it might be a different issue.
It would also help if you could include & share the sanitized debug logs.
@linouk23 yes, indeed it was a typo and we get 404, here is the snippet of two errors we get when we try to create 200+ topics on standard cluster using terraform
.
@AAhmed84 gotcha 👍 , then updating to 0.5.0
should help, ETA is early next week most likely.
update: ETA for releasing 0.5.0
is next Wednesday.
Check out our most recent release of the TF Provider for Confluent Cloud v0.5.0
where we fixed the issue!
cc @AAhmed84 @bogdaniordache @charlottemach @afoley-st
We are trying to upgrade the provider from version 0.2.0, because we are getting the
Error: 429 Too Many Requests
, to the 0.4.0 version.On apply we get the Plan: 15 to add, 0 to change, 0 to destroy.
When confirming the apply, configuration starts to drift: in confluent cloud all the resources are created, but some of them are not reflected in the state file.
Within the process we get the following errors:
and
What can we do to get around this issue?