confluentinc / terraform-provider-confluent

Terraform Provider for Confluent
Apache License 2.0
29 stars 64 forks source link

Allow managing tags and metadata for Topics/Schemas #174

Closed S1M0NM closed 1 year ago

S1M0NM commented 1 year ago

The Confluent Cloud Stream Catalog allows the management of tags and metadata for Schemas and Topics. These tags and metadata can than be used for searching data/topics/schemas.

Currently its not possible to manage tags and/or metadata using terraform.

I would be nice if we were able to create tags and metadata using terraform and also be able to apply them to topics/schemas and use it for giving access to topics or deny access based on the tags.

Resources:

linouk23 commented 1 year ago

That sounds like a great idea @S1M0NM!

Do you have a TF configuration snippet in mind how UX could look like? For example, how tags / metadata resource definition might look like (roughly) given confluent_schema.example and confluent_kafka_topic.orders.

S1M0NM commented 1 year ago

Tags

I would suggest a confluent_tag resource, which uses the /catalog/v1/types/tagdefs API-Endpoint

resource "confluent_tag" "example" {
  name = "<TAG_NAME>"
  description = "<TAG_DESCRIPTION>"
}

Example POST request payload for new tag:

[
  {
    "name": "example",
    "description": "example tag",
    "entityTypes": [
      "cf_entity"
    ]
  }
]

and to assign it to a topic you would have to add another resource maybe something like confluent_topic_tag_config (i'm not sure about the name).

resource "confluent_topic_tag_config" "example" {
  topic_id = confluent_kafka_topic.example.id
  tag_name = confluent_tag.example.name
}

The topic_id and tag_name attribute should be enough since the Request Payload also only contains the topic id (called entityName) and the entityType (which would always be "kafka_topic" in this case) and the tag name (called typeName). Example POST request payload to /catalog/v1/entity/tags for topic:

[
  {
    "entityName": "<Cluster-ID>:<Topic-Name>",
    "entityType": "kafka_topic",
    "typeName": "<Tag-Name>"
  }
]

for a confluent_schema resource there could be a confluent_schema_tag_config resource

resource "confluent_schema_tag_config" "example" {
  schema_identifier = confluent_schema.example.schema_identifier
  tag_name = confluent_tag.example.name
}

For a schema the payload uses the schema-registry-cluster id and the unique schema id as entityName, entityType is "sr_schema" and typeName is again the tag name. (Not sure why there is the "." between the two ":" )

Example POST request payload for schema:

[
  {
    "entityName": "<SR-Cluster-ID>:.:<Unique-Schema-ID>",
    "entityType": "sr_schema",
    "typeName": "<Tag-Name>"
  }
]

Metadata

For the metadata i would suggest a confluent_metadata resource (its called businessmetadata in the docs but maybe metadata is enough here)

resource "confluent_metadata" "example" {
  name = "<Metadata-Name>"
  description = "<Metadata-Description>"
  attributes = ["attribute1", "attribute2"] #Array/List of Attribute Strings
}

Example POST request to /catalog/v1/types/businessmetadatadefs contained following payload:

[
  {
    "name": "example",
    "description": "example-metadata",
    "attributeDefs": [
      {
        "isOptional": true,
        "options": {
          "applicableEntityTypes": "[\"cf_entity\"]"
        },
        "typeName": "string"
      },
      {
        "isOptional": true,
        "name": "attribute1",
        "options": {
          "applicableEntityTypes": "[\"cf_entity\"]"
        },
        "typeName": "string"
      },
      {
        "isOptional": true,
        "name": "attribute2",
        "options": {
          "applicableEntityTypes": "[\"cf_entity\"]"
        },
        "typeName": "string"
      }
    ]
  }
]
resource "confluent_topic_metadata_config" "example" {
  topic_id = confluent_kafka_topic.example.id
  tag_name = confluent_tag.example.name
  attribute_values = [] # Not sure how a implementation could look like in this case since you can add values to the attributes you added in the confluent_metadata resource earlier
}

Example POST request to /catalog/v1/entity/businessmetadata contained following payload:

[
  {
    "attributes": {
      "attribute1": "foo",
      "attribute2": "bar"
    },
    "entityType": "kafka_topic",
    "entityName": "<Cluster-ID>:<Topic-Name>",
    "typeName": "<Metadata-Name>"
  }
]
resource "confluent_schema_metadata_config" "example" {
  schema_identifier = confluent_schema.example.schema_identifier
  tag_name = confluent_tag.example.name
  attribute_values = [] # Not sure how a implementation could look like in this case since you can add values to the attributes you added in the confluent_metadata resource earlier
}

Example POST request payload for schema:

[
  {
    "attributes": {
      "attribute1": "foo",
      "attribute2": "bar"
    },
    "entityType": "sr_schema",
    "entityName": "<SR-Cluster-ID>:.:<Unique-Schema-ID>",
    "typeName": "<Metadata-Name>"
  }
]

@linouk23 i hope i did not miss anything here :D

Noel-Jones commented 1 year ago

Apologies for what may be a silly question but I've spent a couple of hours looking through APIs and trying some calls. Would any of this allow me to set the description against a topic? I didn't want to create a new issue if this covers it. Thx.

S1M0NM commented 1 year ago

Apologies for what may be a silly question but I've spent a couple of hours looking through APIs and trying some calls. Would any of this allow me to set the description against a topic? I didn't want to create a new issue if this covers it. Thx.

my suggestions from above initially only related to tags and metadata. But i guess you could also add resource for descriptions

maybe something like this:

resource "confluent_topic_description" "example" {
  description = "<description>"
  topic_id = confluent_kafka_topic.example.id  
  rest_endpoint = confluent_schema_registry_cluster.example.rest_endpoint #Tags/Descriptions/Metadata are all created using the SR-Cluster Endpoint
}

editing a descriptions sends the following payload to the /catalog/v1/entity API-Endpoint of the Schema Registry cluster

{
  "entity": {
    "typeName": "kafka_topic",
    "attributes": {
      "description": "<description>",
      "qualifiedName": "<Cluster-ID>:<Topic-Name>"
    }
  }
}

and for schemas something like this:

resource "confluent_schema_description" "example" {
  description = "<description>"
  schema_identifier  = confluent_schema.example.schema_identifier

  schema_registry_cluster {
    id = confluent_schema_registry_cluster.essentials.id
    rest_endpoint = confluent_schema_registry_cluster.example.rest_endpoint  
  }  
}

payload:

{
    "entity": {
        "typeName": "sr_schema",
        "attributes": {
            "description": "<description>",
            "qualifiedName": "<SR-Cluster-ID>:.:<Schema-ID>"
        }
    }
}

cc @linouk23 @Noel-Jones

avgalani commented 1 year ago

We're also facing this issue. Would be great to see it shipped!

zhenli00 commented 1 year ago

Hi @S1M0NM We are working on tags and business metadata on terraform, is there a reason why you want a new resource for each tag config (confluent_schema_tag_config, confluent_topic_tag_config). We are thinking to have a generic resource that you can specify the type in config, like

resource "confluent_tag" "main" {
  name = "test_tag"
  description = "t12345 des updated"
  entity_types = ["sr_schema"]
}

resource "confluent_tag_binding" "main" {
  tag_name = confluent_tag.main.name
  entity_name = "100002"
  entity_type = "sr_schema"
}

by the way, the "." is not required in entityName, you can just use schema Id here.

S1M0NM commented 1 year ago

Hi @zhenli00,

is there a reason why you need a new resource for each tag configuration (confluent_schema_tag_config, confluent_topic_tag_config)?

No not really, my suggestion above came from looking at the API calls the web UI makes when you create and assign tags at the time. So your suggestion with the confluent_tag_binding resource definitely makes more sense.

by the way, the "." is not required in entityName, you can just use schema Id here.

Good to know, the payloads I listed above also all originated from the Confluent Cloud UI and the API calls made there.

We are thinking to have a generic resource that you can specify the type in config

Would it then make sense to add an entity_type attribute to the confluent_topic and confluent_schema resources so that you can reference it when creating the confluent_tag_binding?

linouk23 commented 1 year ago

Thanks for the insightful comment @S1M0NM! Could you provide an example for

add an entity_type attribute to the confluent_topic and confluent_schema resources so that you can reference it when creating the confluent_tag_binding?

I'm not aware of a smart way to reference a resource type without hardcoding it 🤔

S1M0NM commented 1 year ago

Thanks for the insightful comment @S1M0NM! Could you provide an example for

add an entity_type attribute to the confluent_topic and confluent_schema resources so that you can reference it when creating the confluent_tag_binding?

I'm not aware of a smart way to reference a resource type without hardcoding it 🤔

That would probably indeed be a hard coded attribute. I'm not sure how useful that is, it was just an idea that you could have used it in for_each constructs.

linouk23 commented 1 year ago

@S1M0NM checkout our latest release where we:

and share the feedback here if possible, thanks!

S1M0NM commented 1 year ago

Hi, I looked at the feature today and now I'm wondering whether it's really necessary for the service account that creates the tags to have EnvironmentAdmin as a RoleBinding.

In the StreamCatalog example, the SA gets this rolebinding. However, based on the RBAC documentation, DataSteward should also be sufficient in this case.

Although this RoleBinding should be sufficient, I get a 403 Forbidden error:

│ Error: error creating Tag 403 Forbidden
│ 
│   with confluent_tag.tag["TLP:GREEN"],
│   on test_tags.tf line 1, in resource "confluent_tag" "tag":
│    1: resource "confluent_tag" "tag" {
│ 

resource definition:

resource "confluent_tag" "tag" {
  for_each = {
    "TLP:CLEAR"            = "Unlimited sharing",
    "TLP:GREEN"            = "Cross-Organizational Sharing",
  }

  schema_registry_cluster {
    id = confluent_schema_registry_cluster.test_schema_registry.id
  }
  rest_endpoint = confluent_schema_registry_cluster.test_schema_registry.rest_endpoint
  credentials {
    key    = confluent_api_key.test_cluster_1_cluster_manager_schema_registry_api_key.id
    secret = confluent_api_key.test_cluster_1_cluster_manager_schema_registry_api_key.secret
  }

  name        = each.key
  description = each.value

  depends_on = [confluent_role_binding.test_cluster_1_sa_cluster_manager_rolebinding_datasteward]
}

Edit: https://docs.confluent.io/cloud/current/stream-governance/stream-catalog.html#create-tags

If you chose free-form, provide a tag name and optional description. Tag names must start with an alphabetic character, and can then include alpha characters, numbers and underscores.

So again the error message is misleading as it is a bad tag name and not a permissions issue


Edit 2: Okay, although the DataSteward Role binding should have sufficient permissions, I still get 403 errors with corrected tag names.

Edit 3: Definitely my fault, just discovered that auto-complete added the wrong API key in the resource. Nevertheless, I think that checking for correctness of the tag names should take place in the plan step.

linouk23 commented 1 year ago

Thanks for sharing your feedback @S1M0NM!

Could you create separate issues for tracking

linouk23 commented 1 year ago

Closing this thread as it seems like we resolved the issues but feel free to reopen it if there's something we didn't address.