confluentinc / terraform-provider-confluent

Terraform Provider for Confluent
Apache License 2.0
30 stars 64 forks source link

resources incorrectly expecting `http_endpoint` in `confluent_kafka_cluster` resource #49

Closed komal-SkyNET closed 2 years ago

komal-SkyNET commented 2 years ago

Cluster's REST http_endpoint is incorrectly expected as a required variable for several resources, thereby deterring usability. Examples: confluent_api_key, confluent_kafka_acl

╷
│ Error: error fetching Kafka Cluster "abc-cluster"'s "http_endpoint" attribute: http_endpoint is nil or empty for Kafka Cluster "abc-cluster"
│    8: resource "confluent_api_key" "cluster-api-key" {
│ 
╵

From TF Docs:

image

From API Docs:

image
linouk23 commented 2 years ago

@komal-SkyNET thanks for reporting the issue!

We're thinking about supporting a configuration where these attributes are moved to a providers block (alongside credentials block), for example:

provider "confluent" {
  cloud_api_key    = var.confluent_cloud_api_key
  cloud_api_secret = var.confluent_cloud_api_secret
  kafka_http_endpoint = ...
  kafka_api_key = ...
  kafka_api_secret = ...
}

such that a user won't need to provide http_endpoint and credentials for confluent_kafka_topic, confluent_kafka_acl:

# before

resource "confluent_kafka_topic" "orders" {
  kafka_cluster {
    id = confluent_kafka_cluster.basic-cluster.id
  }
  topic_name         = "orders"
  partitions_count   = 4
  http_endpoint      = confluent_kafka_cluster.basic-cluster.http_endpoint
  config = {
    "cleanup.policy"    = "compact"
    "max.message.bytes" = "12345"
    "retention.ms"      = "67890"
  }
  credentials {
    key    = confluent_api_key.app-manager-kafka-api-key.id
    secret = confluent_api_key.app-manager-kafka-api-key.secret
  }
}

# after

resource "confluent_kafka_topic" "orders" {
  kafka_cluster {
    id = confluent_kafka_cluster.basic-cluster.id
  }
  topic_name         = "orders"
  partitions_count   = 4
  config = {
    "cleanup.policy"    = "compact"
    "max.message.bytes" = "12345"
    "retention.ms"      = "67890"
  }
}

Do you think it'd be useful?

Marcus-James-Adams commented 2 years ago

@linouk23 as you have just introduced a fix to make them available as they are - and customers are now using the code as it is, moving them and making what would be a breaking change - could I suggest that we have a period of stability concentrating on other issues eg schema registry before changing what does now work - (but may not be in that region yet)

linouk23 commented 2 years ago

@Marcus-James-Adams no worries, we'll continue to support current configurations too!

komal-SkyNET commented 2 years ago
  1. @linouk23 Moving these keys to the provider block definitely makes sense. With respect to the naming, my suggestion would be to call kafka_api_key kafka_api_secret differently, perhaps admin_api_key or manager_api_key to give it more context and distinction.

  2. The main issue brought up in here however is the bug wherein resources incorrectly expect http_endpoint as mandatory parameter, eg;confluent_kafka_acl resource

  3. With respect to the code block refactoring, I agree with @Marcus-James-Adams that features such support for schema registry are a higher priority (we need it too). But this should not discount the defect mentioned in point 2.

linouk23 commented 2 years ago

@komal-SkyNET thanks for the comment!

  1. That's a great idea that said we're planning to support a similar setup for SR / ksqlDB clusters (sr_api_key / ksql_api_key) so admin_api_key might not have great context for new users.
  2. That's expected, once we complete #1, we'll make it optional but it's required for now.

Let me know if that makes sense!

komal-SkyNET commented 2 years ago

Thanks.

  1. It would still be useful to distinguish this privileged key from any other API key created for SR by the user in their TF code. We could prefix these with some word (say admin or master or manager to reflect its higher privilege) ex: admin_sr_api_key. Otherwise, it's a bit ambiguous what keys these are referring to since a user could have multiple api_key resources in code for these very same objects.
linouk23 commented 2 years ago

I see, well, theoretically, a user could pass there any kind of Kafka API Key -- even without any kind of privileges so calling it an admin might be a little bit ambitious.

komal-SkyNET commented 2 years ago

@linouk23 Right, my understanding was that these were intended to be privileged keys. Quite sure I read something along those lines in the doco too (will try to find that). The way I understood it: a user could create multiple confluent_service_account and confluent_api_keys (say in DeveloperRead level) in their TF code but would ideally supply the higher privileged key for creating resources such as acls, service_accounts, etc.

muresan commented 2 years ago

@komal-SkyNET thanks for reporting the issue!

We're thinking about supporting a configuration where these attributes are moved to a providers block (alongside credentials block), for example:

provider "confluent" {
  cloud_api_key    = var.confluent_cloud_api_key
  cloud_api_secret = var.confluent_cloud_api_secret
  kafka_http_endpoint = ...
  kafka_api_key = ...
  kafka_api_secret = ...
}

This I think introduces a loop when someone wants to create a cluster, admin and topic in one terraform module as you cannot have the kafka_api_key defined until you create the cluster and the admin user and the key. I think a clearer option would be go the way databricks did with two different providers :

provider "confluent" {
   alias = "kafka"
   kafka_http_endpoint = confluent_kafka_cluster_v2.standard.rest_endpoint
   kafka_api_key = confluent_api_key_v2.app-manager-kafka-api-key.id
   kafka_api_secret = confluent_api_key_v2.app-manager-kafka-api-key.secret
}

here cloud* would be optional and you can have a separate terraform module that does not have access to cloud* credentials and still work. The example above will not work as Terraform cannot evaluate data at the time it instantiates provider but those are the values. Then you use that provider for kafka resources:

resource "confluent_kafka_topic" "orders" {
  provider = confluent.kafka
  kafka_cluster {
    id = confluent_kafka_cluster.basic-cluster.id
  }
  topic_name         = "orders"
  partitions_count   = 4
  config = {
    "cleanup.policy"    = "compact"
    "max.message.bytes" = "12345"
    "retention.ms"      = "67890"
  }
}

Do you think it'd be useful?

linouk23 commented 2 years ago

@muresan thanks for such a detailed reply!

Could you elaborate on

I think a clearer option would be go the way databricks did with two different providers

I can find just one Databricks provider: https://registry.terraform.io/namespaces/databrickslabs

Regarding

provider "confluent" {
   alias = "kafka"
   kafka_http_endpoint = confluent_kafka_cluster_v2.standard.rest_endpoint
   kafka_api_key = confluent_api_key_v2.app-manager-kafka-api-key.id
   kafka_api_secret = confluent_api_key_v2.app-manager-kafka-api-key.secret
}

that's actually a really neat idea. It might not work right now because of some technical limitations (we do a hacky conversion for service account IDs for confluent_kafka_acl_v3.principal -> service account int ID that our Kafka REST API supports by calling Cloud API (so Cloud API Key is required for now).

We did made a design decision to have a unified TF provider (so that users won't have to fix versioning issues) but having a oneOf for (Kafka creds / Cloud creds) sounds very reasonable!

his I think introduces a loop when someone wants to create a cluster, admin and topic in one terraform module as you cannot have the kafka_api_key defined until you create the cluster and the admin user and the key.

What do you think about our newest examples:

muresan commented 2 years ago

@muresan thanks for such a detailed reply!

Could you elaborate on

I think a clearer option would be go the way databricks did with two different providers

I can find just one Databricks provider: https://registry.terraform.io/namespaces/databrickslabs

Yes that's the one. If you look in any of the examples you can see that the databricks_mws_* resources have provider = databricks.mws set. Similar to how you offer an API to manage Kafka clusters and another API to manage One Kafka cluster, Databricks have one API to manage workspaces and another API to manage One workspace. Multiple "personalities" on the same provider would help by avoiding to split in multiple providers (like for example Elastic Cloud has ec to manage Elastic Cloud Elasticsearch clusters and elasticstack to manage individual clusters.)

Regarding

provider "confluent" {
   alias = "kafka"
   kafka_http_endpoint = confluent_kafka_cluster_v2.standard.rest_endpoint
   kafka_api_key = confluent_api_key_v2.app-manager-kafka-api-key.id
   kafka_api_secret = confluent_api_key_v2.app-manager-kafka-api-key.secret
}

that's actually a really neat idea. It might not work right now because of some technical limitations (we do a hacky conversion for service account IDs for confluent_kafka_acl_v3.principal -> service account int ID that our Kafka REST API supports by calling Cloud API (so Cloud API Key is required for now).

We did made a design decision to have a unified TF provider (so that users won't have to fix versioning issues) but having a oneOf for (Kafka creds / Cloud creds) sounds very reasonable!

his I think introduces a loop when someone wants to create a cluster, admin and topic in one terraform module as you cannot have the kafka_api_key defined until you create the cluster and the admin user and the key.

What do you think about our newest examples:

Those examples fit cases with two teams, there are cases with three tiers as well: networking/, backing services/kafka admin, application/product team. I know it's more complicated but think also from the lifecycle of various resources:

I think in enterprise most of the time there will be a networking team, a services team and an application team that each need specific access to parts of this setup. Networking team to provision the VPC resources (in Confluent and other clouds) and peerings, Services team to bring up the Kafka cluster and manage it, create accounts and ACLs, and Application team to connect to the cluster and manage resources from the cluster (topics, cluster accounts, auth, etc). And this also means 3 different deploy pipelines and life-cycles with 3 different access levels. Whatever can be done to support such a configuration would be best.

linouk23 commented 2 years ago

@muresan

Similar to how you offer an API to manage Kafka clusters and another API to manage One Kafka cluster, Databricks have one API to manage workspaces and another API to manage One workspace.

That's super actually insightful! I wonder what's more convenient:

# DB API #1 (one workspace)?
resource "databricks_mws_private_access_settings" "pas" {
  provider                     = databricks.mws
  account_id                   = var.databricks_account_id
  private_access_settings_name = "Private Access Settings for ${local.prefix}"
  region                       = var.region
  public_access_enabled        = true
}

# DB API #2 (many workspaces)?
resource "databricks_cluster" "shared_autoscaling" {
  cluster_name            = "Shared Autoscaling"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 20
  autoscale {
    min_workers = 1
    max_workers = 50
  }
}

vs 

provider "confluent" {
  alias = "one"
  kafka_rest_endpoint = "..."
  kafka_api_key = "..."
  kafka_api_secret = "..."
}

resource "confluent_kafka_topic" "orders" {
  provider = confluent.one
  kafka_cluster {
    id = confluent_kafka_cluster.basic-cluster.id
  }
  topic_name         = "orders"
  partitions_count   = 4
  config = {
    "cleanup.policy"    = "compact"
    "max.message.bytes" = "12345"
    "retention.ms"      = "67890"
  }
}

# and 

provider "confluent" {
  alias = "many"
  cloud_api_key = "..."
  cloud_api_secret = "..."
}

resource "confluent_environment_v2" "prod" {
  provider = confluent.many
  display_name = "Production"
}

Multiple "personalities" on the same provider would help by avoiding to split in multiple providers (like for example Elastic Cloud has ec to manage Elastic Cloud Elasticsearch clusters and elasticstack to manage individual clusters.)

I don't think we'll follow their path (the biggest reason was multiple customers mentioned potential versioning issues) but is that something you'd prefer over aliasing?

I think in enterprise most of the time there will be a networking team, a services team and an application team that each need specific access to parts of this setup.

That's super helpful to hear! We'll try to come up with a few more examples for a setup with 3 teams that would feature NetworkAdmin role.

linouk23 commented 2 years ago

Those examples fit cases with two teams, there are cases with three tiers as well: networking/, backing services/kafka admin, application/product team. I know it's more complicated but think also from the lifecycle of various resources: longest living: vpc peering/environments next in order: kafka cluster/admin credentials next in order: kafka topics/credentials I think in enterprise most of the time there will be a networking team, a services team and an application team that each need specific access to parts of this setup. Networking team to provision the VPC resources (in Confluent and other clouds) and peerings, Services team to bring up the Kafka cluster and manage it, create accounts and ACLs, and Application team to connect to the cluster and manage resources from the cluster (topics, cluster accounts, auth, etc). And this also means 3 different deploy pipelines and life-cycles with 3 different access levels. Whatever can be done to support such a configuration would be best.

@muresan by the way, could you list the expected roles for each team? For example, it looks like

muresan commented 2 years ago

@muresan by the way, could you list the expected roles for each team? For example, it looks like

  • Kafka Ops team (manage environments -- OrganizationAdmin role)
  • Networking team (manage networks across all environments -- NetworkAdmin role)
  • Services team (manage a specific environment -- EnvironmentAdmin role)
  • Application team (manage a specific cluster -- CloudClusterAdmin role)

In my example Kafka Ops and Services team are one team, they engage with the vendor (Confluent) and manage organization and approve Kafka cluster creation, etc. then pass on credentials to Application/product team and Networking team. The Networking team only cares about the connectivity between the services, in our specific case GCP peering/routes/DNS (not needed for GCP since private service connect is not supported). I think 4 teams is not realistic.

muresan commented 2 years ago

@muresan

Similar to how you offer an API to manage Kafka clusters and another API to manage One Kafka cluster, Databricks have one API to manage workspaces and another API to manage One workspace.

That's super actually insightful! I wonder what's more convenient:

# DB API #1 (one workspace)?
resource "databricks_mws_private_access_settings" "pas" {
  provider                     = databricks.mws
  account_id                   = var.databricks_account_id
  private_access_settings_name = "Private Access Settings for ${local.prefix}"
  region                       = var.region
  public_access_enabled        = true
}

# DB API #2 (many workspaces)?
resource "databricks_cluster" "shared_autoscaling" {
  cluster_name            = "Shared Autoscaling"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 20
  autoscale {
    min_workers = 1
    max_workers = 50
  }
}

vs 

provider "confluent" {
  alias = "one"
  kafka_rest_endpoint = "..."
  kafka_api_key = "..."
  kafka_api_secret = "..."
}

resource "confluent_kafka_topic" "orders" {
  provider = confluent.one
  kafka_cluster {
    id = confluent_kafka_cluster.basic-cluster.id
  }
  topic_name         = "orders"
  partitions_count   = 4
  config = {
    "cleanup.policy"    = "compact"
    "max.message.bytes" = "12345"
    "retention.ms"      = "67890"
  }
}

# and 

provider "confluent" {
  alias = "many"
  cloud_api_key = "..."
  cloud_api_secret = "..."
}

resource "confluent_environment_v2" "prod" {
  provider = confluent.many
  display_name = "Production"
}

It's one option and in the end it depends on how easy it is to implement and maintain and how well Terraform supports this.

Multiple "personalities" on the same provider would help by avoiding to split in multiple providers (like for example Elastic Cloud has ec to manage Elastic Cloud Elasticsearch clusters and elasticstack to manage individual clusters.)

I don't think we'll follow their path (the biggest reason was multiple customers mentioned potential versioning issues) but is that something you'd prefer over aliasing?

I don't have a specific preference, it's up to Confluent to support the provider, if there is a single API endpoint for both Confluent resources and Kafka resources then it makes sense to go the single provider route with "multiple personalities". I think it makes sense for Elastic to split because you can use the elasticstack provider to manage an Elasticsearch instance which is not running in Elastic Cloud, you can manage an on premises instance. I did not go deep enough in Confluent docs to find out if this is the case or not with Confluent Cloud Kafka clusters and non-cloud Confluent Kafka clusters or even pure Apache Kafka clusters.

I think in enterprise most of the time there will be a networking team, a services team and an application team that each need specific access to parts of this setup.

That's super helpful to hear! We'll try to come up with a few more examples for a setup with 3 teams that would feature NetworkAdmin role.

linouk23 commented 2 years ago

@komal-SkyNET speaking about the original issue, we continued the conversation in https://github.com/confluentinc/terraform-provider-confluent/issues/14#issuecomment-1157141796

linouk23 commented 2 years ago

closing the issue since our backend team rolled out a fix for

│ Error: error fetching Kafka Cluster "abc-cluster"'s "http_endpoint" attribute: http_endpoint is nil or empty for Kafka Cluster "abc-cluster" │ 8: resource "confluent_api_key" "cluster-api-key" {

Also check out our latest 1.0.0 release

The Confluent Terraform Provider is now Generally Available and recommended for use in production workflows.

Marcus-James-Adams commented 2 years ago

Now the bug is fixed and rolling through the regions why break what's working?

At the moment

I can create, cluster, accounts, topics in a single layer of code with no complexity.

This provider has been a long time coming, but now it's here it's working good

Don't change a thing on this side of things finish rolling the fix for the http bug through the regions

On Fri, 17 Jun 2022, 19:09 Catalin Muresan, @.***> wrote:

@komal-SkyNET https://github.com/komal-SkyNET thanks for reporting the issue!

We're thinking about supporting a configuration where these attributes are moved to a providers block (alongside credentials block), for example:

provider "confluent" { cloud_api_key = var.confluent_cloud_api_key cloud_api_secret = var.confluent_cloud_api_secret kafka_http_endpoint = ... kafka_api_key = ... kafka_api_secret = ... }

This I think introduces a loop when someone wants to create a cluster, admin and topic in one terraform module as you cannot have the kafka_api_key defined until you create the cluster and the admin user and the key. I think a clearer option would be go the way databricks did with two different providers :

provider "confluent" { alias = "kafka" kafka_http_endpoint = confluent_kafka_cluster_v2.standard.rest_endpoint kafka_api_key = confluent_api_key_v2.app-manager-kafka-api-key.id kafka_api_secret = confluent_api_key_v2.app-manager-kafka-api-key.secret }

here cloud* would be optional and you can have a separate terraform module that does not have access to cloud* credentials and still work, and then you use that provider for kafka resources:

resource "confluent_kafka_topic" "orders" { provider = confluent.kafka kafka_cluster { id = confluent_kafka_cluster.basic-cluster.id } topic_name = "orders" partitions_count = 4 config = { "cleanup.policy" = "compact" "max.message.bytes" = "12345" "retention.ms" = "67890" } }

Do you think it'd be useful?

— Reply to this email directly, view it on GitHub https://github.com/confluentinc/terraform-provider-confluent/issues/49#issuecomment-1159120492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7GYFZ4OX2E6ZZAICAXQULVPS5OZANCNFSM5YMTATHA . You are receiving this because you were mentioned.Message ID: @.*** com>

linouk23 commented 2 years ago

@Marcus-James-Adams sure thing, we don't want to break anything!

For the context, we did add support for both scenarios:

# Option #1 when managing multiple clusters in the same Terraform workspace
provider "confluent" {
  cloud_api_key    = var.confluent_cloud_api_key    # optionally use CONFLUENT_CLOUD_API_KEY env var
  cloud_api_secret = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var
}

# Option #2 when managing a single cluster in the same Terraform workspace
# See https://github.com/confluentinc/terraform-provider-confluent/tree/master/examples/configurations/managing-single-cluster for more details
provider "confluent" {
  cloud_api_key       = var.confluent_cloud_api_key    # optionally use CONFLUENT_CLOUD_API_KEY env var
  cloud_api_secret    = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var

  kafka_rest_endpoint = var.kafka_rest_endpoint        # optionally use KAFKA_REST_ENDPOINT env var 
  kafka_api_key       = var.kafka_api_key              # optionally use KAFKA_API_KEY env var
  kafka_api_secret    = var.kafka_api_secret           # optionally use KAFKA_API_SECRET env var
}

as described in our docs.