Remarks regarding the investigation

Kubernetes/Helm Provider (do-kubernetes/)

Interesting. Yes it does seem like it does not understand the provider change.

I think the issue is that it first destroys the cluster and then attempts to destroy the resource in the state that still uses the old provider.

Please try to make the provider passing more explicit.

provider "kubernetes" {
  host                   = digitalocean_kubernetes_cluster.main.endpoint
  cluster_ca_certificate = base64decode(digitalocean_kubernetes_cluster.main.kube_config.0.cluster_ca_certificate)
  token                  = digitalocean_kubernetes_cluster.main.kube_config.0.token
+  alias = "digital_ocean_cluster"
}

resource "kubernetes_secret" "test" {
-  depends_on = [digitalocean_kubernetes_cluster.main]
+
+ provider = kubernetes.digital_ocean_cluster
+

  metadata {
    name = "test"
  }
  data = {
    "hello" : "world"
  }
}

Refaktory MinIO Provider (vultr-refaktory-minio/)

There is information missing to draw conclusions.

Endpoint: does not follow ip address or domain name standards. could indicate that vultr_object_storage.main.s3_hostname. Is not the correct host name expected by this provider. I don't see how this relates to the resource not existing.

Are you sure it is the right one? The value does not exist: "(known after apply)" so I would have guessed that terraform needs to wait for it to be available.

Can you output the vultr_object_storage.main.s3_hostname and see if it would work in a 2-step process?

aminueza MinIO Provider (vultr-aminueza-minio/)

Same as for the previous point. If the host name is not correct the provider will not matter.

Don't have a vultr account to test myself.

Interesting. Yes it does seem like it does not understand the provider change.

I'm not sure what you mean with "provider change". The provider stays the same, but it doesn't get credentials because the cluster is going to be recreated and in that case the digitalocean_kubernetes_cluster resource doesn't return valid credentials.

Because resources with that provider already exist, Terraform wants to read the current resource state, but it fails because the provider doesn't get the right credentials. This would not be an issue if Terraform knew the resources shouldn't be read (which I assume delays the initialization of the provider up until the point when the resources need to be created which is after the cluster was created and the credentials exist; however that appears to be an implementation detail of the kubernetes provider).

Please try to make the provider passing more explicit.

Tbh. I don't see what that would change; but I'll give it a try.

Are you sure it is the right one? The value does not exist: "(known after apply)" so I would have guessed that terraform needs to wait for it to be available.

I will retry it to make sure, but this is how I did it in separate workspaces. I can simply test it by doing -target applies. The error message is consistent with the error message it would produce if you give it an empty string as the MinIO server URL, but I agree that it would likely also be the same error message if the s3_hostname was missing the URL scheme.

In any case, I don't think it matters that much because even if the URL value it got was valid, the behavior here shows that the provider is trying to do something with its credentials before a resource needs to be read or created. This behavior is different from the kubernetes provider.

It's possible that the provider could be happy with a valid URL and doesn't actually try to connect to the server before a resource must be read or created; however that means we would need to know the S3 endpoint in advance.

If the underlying bucket gets recreated, we'd also run into the same issue as above (e.g. Terraform trying to read the minio_s3_bucket resource but the S3 bucket is being destroyed and most likely the Vultr resource doesn't return valid credentials anymore).

Anyway, happy to give it a try again. I'll report back,.

Not sure if I can follow here. I am not sure when terraform tries to do what with which credentials and the logs to not clarify it.

in that case the digitalocean_kubernetes_cluster resource doesn't return valid credentials.

How do you get to this conclusion?

From the logs it is not clear to me which credentials are used. They could be the old ones pointing to a now non-existent cluster (therefore "connection refused").
Pointing into the new cluster should be fine because recreate will also create new credentials?

invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable gives me hope that passing the provider explicitly helps here.

In the case the cluster is recreated terraform either needs to:

Remove the secrets inside the cluster before recreating the cluster, then create both again.
Know that by deleting the kubernetes cluster the resources are automatically deleted.

I have the strong suspicion that there are cases here that terraform cannot cover and thus doing both of these things is in single module a bad idea. This might explain the general best practice to separate infra provisioning from deployment into it.

In any case, I don't think it matters that much because even if the URL value it got was valid, the behavior here shows that the provider is trying to do something with its credentials before a resource needs to be read or created. This behavior is different from the kubernetes provider.

Because there is not "resource created" line in the logs?

It's possible that the provider could be happy with a valid URL and doesn't actually try to connect to the server before a resource must be read or created; however that means we would need to know the S3 endpoint in advance.

Even if we knew it in advance we have the same issue for the access credentials, which we don't know.
Btw. we know the S3 endpoint in advance in many scenarios, because this does not change and the bucket is identified by the "accesskey". https://s3.region-code.amazonaws.com

How do you get to this conclusion?

From the logs it is not clear to me which credentials are used. They could be the old ones pointing to a now non-existent cluster (therefore "connection refused"). Pointing into the new cluster should be fine because recreate will also create new credentials?

I think it is a logical conclusion. The error occurs during the plan phase, thus the new cluster doesn't exist yet, which means there can be no credentials for it. It also means the old cluster hasn't been destroyed yet, so it also can't be that it returns the credentials of the old cluster.

I have the strong suspicion that there are cases here that terraform cannot cover and thus doing both of these things is in single module a bad idea. This might explain the general best practice to separate infra provisioning from deployment into it.

I agree. It means we actually do have to think about a higher granularity of workspaces and potentially splitting components that logically belong together into multiple stages of deployment as I had originally anticipate. 😄

Because there is not "resource created" line in the logs?

Actually there is, for the vultr_object_storage_cluster, but when it then tries to plan for the buckets that are supposed to be using the credentials for that cluster, it fails. Just to make sure the context is given in this discussion as the README may change:

> VULTR_API_KEY=... terraform apply
data.vultr_object_storage_cluster.ams: Reading...
data.vultr_object_storage_cluster.ams: Read complete after 1s

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform planned the following actions, but then encountered a problem:

  # vultr_object_storage.main will be created
  + resource "vultr_object_storage" "main" {
      + cluster_id    = 6
      + date_created  = (known after apply)
      + id            = (known after apply)
      + label         = "test-storage"
      + location      = (known after apply)
      + region        = (known after apply)
      + s3_access_key = (sensitive value)
      + s3_hostname   = (known after apply)
      + s3_secret_key = (sensitive value)
      + status        = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
╷
│ Error: Endpoint:  does not follow ip address or domain name standards.
│ 
│   with provider["registry.terraform.io/refaktory/minio"],
│   on main.tf line 25, in provider "minio":
│   25: provider "minio" {
│ 
╵

Even if we knew it in advance we have the same issue for the access credentials, which we don't know. Btw. we know the S3 endpoint in advance in many scenarios, because this does not change and the bucket is identified by the "accesskey". https://s3.region-code.amazonaws.com

That is true, but I would rather not rely on what appears to be very a inconsistent implementation detail of providers that doesn't even work in all scenarios for our primary provider (i.e. the Kubernetes provider). If tomorrow they change something about the Kubernetes provider that will have it try to use the provider credentials earlier than it does not, e.g. before the cluster is created, then we have to re-architect all our modules.

So in my opinion it's better we stick to what we said before: Providers should not be initialized from the outputs of a resource in the same workspace.

Plan: 1 to add, 0 to change, 1 to destroy.

in the logs made be think that you already clicked apply. But makes sense. The secrets would need to be added again so "1 to add" is not correct.

Have you tried the "explicit provider" already?

I agree. It means we actually do have to think about a higher granularity of workspaces and potentially splitting components that logically belong together into multiple stages of deployment as I had originally anticipate. 😄

Good that we are already doing that ;)

Actually there is, for the vultr_object_storage_cluster, but when it then tries to plan for the buckets that are supposed to be using the credentials for that cluster, it fails.

This is strange. The cluster IS already created from a previous run. Why is there now valid connection to it in the state?

So in my opinion it's better we stick to what we said before: Providers should not be initialized from the outputs of a resource in the same workspace.

Yes, and this means that setting up a product is always a multi-terraform-project/state process.

You might wanna have a look at this thread here (and the last answer): https://github.com/hashicorp/terraform-provider-postgresql/issues/152#issuecomment-714616030 It indicates that the "host" is the issue.

That being said, this whole "all-in-one" approach seems to go into the flaky territory of terraform, and is likely better to be avoided.

NiklasRosenstein / terraform-delayed-provider-initialization-test