confluentinc / terraform-provider-confluentcloud

Confluent Cloud Terraform Provider is deprecated in favor of Confluent Terraform Provider
https://registry.terraform.io/providers/confluentinc/confluentcloud/latest/docs
52 stars 23 forks source link

v0.4.0/v0.5.0 - Unable to create ACL after creating topic (401 error) - basic cluster #51

Closed bluedog13 closed 2 years ago

bluedog13 commented 2 years ago
  1. Create the service accounts using the Cloud Keys. - WORKS AS EXPECTED
  2. Create a topic with the cluster key. - WORKS AS EXPECTED

However when I try to add ACL to the topic, for the created service account using the same cluster key, I get the below error

401 Unauthorized
│ 
│   with confluentcloud_kafka_acl.mynamespace-myapp-sample-private-producer,
│   on topic-sample.tf line 23, in resource "confluentcloud_kafka_acl" "mynamespace-myapp-sample-private-producer":
│   23: resource "confluentcloud_kafka_acl" "mynamespace-myapp-sample-private-producer" {

Below is the setup

resource "confluentcloud_kafka_topic" "mynamespace-myapp-sample-private" {
  kafka_cluster    = var.azure_sandbox_cluster_id
  topic_name       = var.mynamespace-myapp-sample-private_topic
  partitions_count = 3
  http_endpoint    = var.azure_sandbox_http_endpoint
  config = {
    "cleanup.policy"      = var.topic_delete_cleanup_policy,
    "max.message.bytes"   = var.topic_max_message_size_bytes,
    "retention.ms"        = var.topic_retention_time_day_ms,
    "min.insync.replicas" = var.topic_min_insync_replicas
  }

  credentials {
    key    = var.cluster_api_key
    secret = var.cluster_api_secret
  }
}

## Producers
#  --------------------------------------------------------------
# ACL (WRITE) for Producer
resource "confluentcloud_kafka_acl" "mynamespace-myapp-sample-private-producer" {
  kafka_cluster = var.azure_sandbox_cluster_id
  resource_type = "TOPIC"
  resource_name = confluentcloud_kafka_topic.mynamespace-myapp-sample-private.topic_name
  pattern_type  = "LITERAL"
  principal     = "User:${var.mynamespace-myproducerapp-sa_id}"
  host          = "*"
  operation     = "WRITE"
  permission    = "ALLOW"
  http_endpoint = var.azure_sandbox_http_endpoint

  credentials {
    key    = var.cluster_api_key
    secret = var.cluster_api_secret
  }
}

What is causing the "401" error during the creation of ACL when the cluster is the same and the topic was provisioned using the same cluster keys?

Much appreciated

ConfluentSpencer commented 2 years ago

@bluedog13 what version of the Provider are you using? Can you also share the format of this Service Account ID: principal = "User:${var.mynamespace-myproducerapp-sa_id}" Is it a series of numbers, or is it SA-1234 format?

Can you also try removing this line and seeing if it will work? host = "*"

bluedog13 commented 2 years ago

Thank you for the quick reply.

Below are the details of the provider I am using

terraform {
  required_providers {
    confluentcloud = {
      source  = "confluentinc/confluentcloud"
      version = "0.5.0"
    }
  }
}

The principal is of the format (this is a change from v0.3.0)

+ principal     = "User:sa-yo8***"

I still get the 401 error after making the below changes

Took off the host part

#host          = "*"
ConfluentSpencer commented 2 years ago

Can you email me your Org ID at sshumway@confluent.io. Instructions for finding your Org ID here: https://docs.confluent.io/cloud/current/client-apps/cloud-basics.html#view-organization-bill-and-id

Also can you follow these steps to see if the TF Logs have any useful info. Email me what you find here as well: Enable debug logging in Terraform. Debug logging will give you detailed error messages that you can use to investigate issues. Use these commands to enable debug logging: export TF_LOG_PATH="./full_log.txt" export TF_LOG=debug terraform apply --auto-approve You should then open the log file in TF_LOG_PATH path to see the full logs.

bluedog13 commented 2 years ago

I have sent the ord id to the above email. I'll share the logs once I set them up. Thanks.

PS : I am using "basic" cluster to create the topics and ACL's. This worked in v0.3.0 previously.

linouk23 commented 2 years ago

@bluedog13 just to double check, could you share the role / ACLs of the service account that owns a corresponding Kafka API Key (I know that you'd likely see 403 error instead of 401 if that's the case but still):

# ACLs:
confluent kafka acl list --service-account sa-1a2b3c
# Roles:
# https://confluent.cloud/settings/org/assignments
bluedog13 commented 2 years ago

The service account was created using "confluentcloud_service_account" resource and has been manually assigned a key/secret that is restricted to the environment:cluster the topic resides in.

$ ccloud api-key create \
    --resource lkc-**** \          # topic cluster 
    --environment env-**** \       # topic environment
    --service-account sa-yo8***

The ACL details are as follows.

$ ccloud kafka acl list --service-account sa-yo8***j

UserId | ServiceAccountId | Permission | Operation | Resource | Name | Type
---------+------------------+------------+-----------+----------+------+-------

The generated API key details are as follows

$ ccloud api-key list -o json | jq '.[] | select(.key == "7ZXHHAPNE*********")'

{
  "created": "2022-03-**T02:03:56Z",
  "description": "",
  "key": "7ZXHHAPNE*******",
  "owner": "561****",
  "owner_email": "<service account>",
  "owner_resource_id": "sa-yo8***",
  "resource_id": "lkc-****",
  "resource_type": "kafka"
}

The service account details are as follows

$  ccloud service-account list -o json | jq '.[] | select(.resource_id == "sa-yo8**")'

{
  "description": "Service account for `mynamespace` namespace and `myapp` application",
  "id": "561***",
  "name": "mynamespace-myproducerapp-sa",
  "resource_id": "sa-yo8***"
}
linouk23 commented 2 years ago

Thanks for a quick reply and attaching all the commands you run!

Could you also open https://confluent.cloud/settings/org/assignments and confirm that sa-yo8** SA has a CloudClusterAdmin, EnvironmentAdmin, or OrganizationAdmin role in the target environment?

The idea is Kafka API Key infers the permission of the owner ("sa-y****") and it seems like it got none ACLs attached to it and it might be the case that it has no role assigned in the target environment so 401 essentially means 403.

For the reference, here's our sample configuration file. You can see there's a app-manager with CloudClusterAdmin role that creates topics & ACLs.

Luckily, we'll release api_key resource with corresponding examples soon soon so this problem will disappear.

bluedog13 commented 2 years ago

The account "sa-yo8**" has no role assigned in the target environment.

The question now I have is - I created the service account using the below. The part I have missed by comparing with the the shared configuration file is the "confluentcloud_role_binding" part to the service account. Could this be my issue?

resource "confluentcloud_service_account" "mynamespace-myproducerapp-sa" {
  display_name = var.mynamespace-myproducerapp-sa
  description  = "Service account for `mynamespace` namespace and `myapp` application"
}

Does this mean, it's not sufficient to create a service account but one also has to assign it a role to the required cluster before granting ACL? (Is this a change from v0.3.0?)

Also, if I need to use the service account across all environments (dev/test/uat/prod), do I need to assign multiple role-binding to the service account, one for each environment?

Lastly, I am bit confused as to the purpose of the API key/secret being assigned to the service account, because this does not seem to help if the service account does not have the right role-binding - even though the key/secret was assigned to a given environment-cluster. Service accounts are not environment specific from what I understand.

Thanks.

linouk23 commented 2 years ago

Could this be my issue?

That's a great idea to create that role binding and try to recreate ACLs that was failing with 401 before. If that does fix 401 -- let us know and we'll answer your other questions from the comment once we figure out the exact issue.

bluedog13 commented 2 years ago

I was able to resolve this without adding role-binding part to the service account.

The resolution was - the cloud key/secret is required somewhere (not sure), even though we specify the "cluster_api_key" and "cluster_api_secret" while provisioning the topic and ACL's. In my solution setup, I have a seperate folder to create topics (using work spaces for different environments)

Adding the api_key and api_secret in the below 2 lines fixed it

provider "confluentcloud" {
  api_key    = var.cloud_api_key.      
  api_secret = var.cloud_api_secret
}

Are the cloud keys required to map to the org or what's their purpose during provisioning topics and ACL's in a given cluster?

Thanks.

linouk23 commented 2 years ago

That's great to hear!

Are the cloud keys required to map to the org or what's their purpose during provisioning topics and ACL's in a given cluster?

They are required for ACLs and the reason why is TF Provider is doing service account ID conversion behind the scenes (sa-abc123 -> 56789) by using an API that requires a token generated from Cloud API Key.

bluedog13 commented 2 years ago

Thanks for the clarification. I believe this is the change in v0.4.0/v0.5.0 where "sa-****" format is being used for ACL and not integer values like in earlier versions.

This also makes sense as to why it worked in v0.3.0 and earlier without cloud key.

linouk23 commented 2 years ago

Thanks for figuring it out on your own @bluedog13, great job!

ConfluentSpencer commented 2 years ago

Nice work @bluedog13! Thanks for helping @linouk23!

ConfluentSpencer commented 2 years ago

@bluedog13 would you be willing to share more about how you structure your TF workspace? I'm wondering if we will need to call this out somewhere in our docs for people not doing monolithic deployments.

bluedog13 commented 2 years ago

Sure, I can share the structure. I am also using workspaces to map to different environments for clusters and topics. Service accounts is tricky since they are not tied to environment.

.
├── README.md
└── resources
    ├── 1_kafka-cluster
    │   ├── README.md
    │   ├── clusters.tf
    │   ├── images
    │   │   └── cluster-provisioning.png
    │   ├── main.tf
    │   ├── non-prod
    │   │   └── terraform.tfvars
    │   ├── outputs.tf
    │   ├── prod
    │   │   └── terraform.tfvars
    │   ├── sandbox
    │   │   └── terraform.tfvars
    │   ├── terraform.tfstate
    │   ├── terraform.tfstate.d
    │   │   ├── nonprod
    │   │   ├── prod
    │   │   └── sandbox
    │   │       ├── terraform.tfstate
    │   │       └── terraform.tfstate.backup
    │   ├── tfplan_create_cluster_sandbox
    │   └── variables.tf
    ├── 2_service-accounts
    │   ├── 0.example
    │   │   ├── accounts.tf
    │   │   ├── main.tf
    │   │   ├── outputs.tf
    │   │   ├── terraform.tfvars
    │   │   └── variables.tf
    │   ├── README.md
    │   ├── namespace1
    │   │   ├── main.tf
    │   │   ├── outputs.tf
    │   │   ├── accounts.tf
    │   │   ├── terraform.tfvars
    │   │   └── variables.tf
    │   ├── images
    │   │   └── service-accounts-provisioning.png
    │   ├── namespace2
    │   │   ├── main.tf
    │   │   ├── outputs.tf
    │   │   ├── accounts.tf
    │   │   ├── terraform copy.tfvars
    │   │   ├── terraform.tfvars
    │   │   └── variables.tf
    │   ├── sce
    │   └── vdi
    └── 3_kafka-topics
        ├── README.md
        ├── dev
        │   └── terraform.tfvars
        ├── full_log.txt
        ├── images
        │   └── topic-provisioning.png
        ├── main.tf
        ├── outputs.tf
        ├── prod
        │   └── terraform.tfvars
        ├── terraform.tfstate
        ├── terraform.tfstate.d
        │   ├── dev
        │   │   ├── terraform.tfstate
        │   │   └── terraform.tfstate.backup
        │   ├── prod
        │   ├── test
        │   │   ├── terraform.tfstate
        │   │   └── terraform.tfstate.backup
        │   └── uat
        │       └── terraform.tfstate
        ├── test
        │   └── terraform.tfvars
        ├── tfplan_create_topic_sandbox_dev
        ├── tfplan_create_topic_sandbox_test
        ├── tfplan_create_topic_sandbox_uat
        ├── topic-sample.tf
        ├── uat
        │   └── terraform.tfvars
        ├── variables-sample-topic.tf
        └── variables.tf