1123 / confluent-cloud-service-broker

Service Broker for Apache Kafka
5 stars 3 forks source link

Revamp the Service Instance lifecycle implementations #6

Open daniellavoie opened 4 years ago

daniellavoie commented 4 years ago

Context

I find that the actions being executed by create-service and bind-service are a bit counter-intuitive. Having to create different services and bind for each individual topic is, in my opinion, a bit awkward. For each binding and each topic, the service broker will end up providing separate credentials and discovery details.

Proposed changes

Service Plans

The Service Broker should handle custom plan definition for Platform Operator to control what kind of Kafka service is provided to the application teams.

Plans could contains parameters such as quotas, target cluster (existing or on-demand), etc.

Create Service

The create-service operation should allow application operators to specify topics and ACL rules for other spaces and organizations. On the Kafka side, nothing is executed. ACL would be applied only during a bind-service. The motivation of holding an ACL metadata object is to let the service instance owner define who may or may not consume his topics. The service broker would use that authoritative information during a bind-service initiated by a client application to accept or refuse to configure ACL of the topics owned by the service instance owner. Within Cloud Foundry, the service instance will not be visible until a share-service command is initiated.

When on-demand clusters will be supported by Confluent Cloud, this operation would also handle the routine of provisioning a new cluster dynamically (through CCloud or Operator). If the selected service plan is configured to target an existing cluster, it would entirely skip provisioning during create-service. Service plans are assignable to spaces and organization (or k8s namespace). This offers a control mechanism for Platform Operators to define who may or may not create new clusters on demand.

Update Service

An update-service operation should update the ACL rule definition and also update the ACL for any existing binding. create-service does not require to go through this routine since no application is bound to the service.

Bind Service

Credential management

The bind service command could be refactored to handle dynamic credentials generation for plans configured against CCloud or an existing cluster. This dynamic behavior could be configurable per service plan. Depending on the target plan, we could support different account provisioning. Until a full-fledged API control pane is available on CP and CC, we can prototype our way forward by supporting token delegation with the use of the AdminClient.

Topic management

The ACL definition rules defined in the create-service and update-service. Whenever a bind-service operation is triggered, the ACL update routine should be triggered based on the definition provided by create-service and bind-service. This would provide a true self service and decentralized topic management story.

1123 commented 4 years ago

Great feedback, Daniel. I agree this would be a very valuable improvement. Currently I am busy with other things, but I have added you as a collaborator, in case you want to contribute.

I will have to look into the details on how this would work with token delegation.

One thought regarding topic management: I agree that allowing to create any number of topics might cause trouble. Still, possibly limiting the amount of topics or partitions which can be created during a predefined timeframe (e.g. one week) by a user, might resolve this situation. Not allowing topic creation at all via the service broker, takes away a large part of the self service functionality... what do you think?

daniellavoie commented 4 years ago

I'm likely to go ahead on contributing myself. thanks for adding me as a collaborator. My goal is to reach consensus before PRing anything :)

So, I consider topic management a subject similar to schema management on the database. These are the tasks of an administrator because there is a lot of settings that might be implied. Just like for db schema migration, you are likely to want auto schema management in dev, but production deployment might be rolled out by DBAs or data engineers. I don't have an opinion about who does what. I only think that making this behavior configurable per plan would be a powerful feature. Self-service on dev environments, with prefixed topics, you do what you want, you are king. But for production environment, it's a bit more locked down.

I think the power of a service broker is not to automate everything, but to give you self-service if you are ALLOWED to. This is why I think some kind of rule engine defined by the service broker administrator would allow to auto-create topics or even allow binding to existing topics based on the application space or namespace matching some rules. What do you think?

daniellavoie commented 4 years ago

Alright, I went through a brainstorm and updated the issue description. I figured out that the service owners could actually specify the ACL rules they wish to share with other teams through the service instance configuration JSON. The service instance configuration would implicitly contain the topics and their ACL. The update-service command can be used to add topics or modify the ACL In that sense, @1123 is right, the less friction there is, the better it is. You create a topic you own it. A topic prefix mechanism based on the space name would probably help with topic ownership and avoid conflicts.

1123 commented 4 years ago

That sounds great. I hope I will find some time to try out the token delegation thing soon. Also to try out the provisioning of clusters via operator. I haven't found a documentation of the operator API yet, and I guess we don't want to do this via helm :-)