devshawn / kafka-gitops

🚀Manage Apache Kafka topics and generate ACLs through a desired state file.
https://devshawn.github.io/kafka-gitops
Apache License 2.0
317 stars 71 forks source link

How to use with shared environments? #24

Open AntAreS24 opened 4 years ago

AntAreS24 commented 4 years ago

Hi,

We're using Kafka as a shared cluster and as such, we have multiple projects with multiple teams working in parallel. We are planning on having 1 repo per project/system, with their own states and plans, but during our initial testing, we realised that one project is wiping the config of another project since we don't have an overall state file.

Is that the intended purpose? Is there a way to prevent that?

AntAreS24 commented 4 years ago

It even deletes it's own user permissions (assuming we've applied this: https://devshawn.github.io/kafka-gitops/#/permissions?id=manually-add-acls)


/gitops$ ./kafka-gitops apply
Executing apply...

Applying: [DELETE]

- [ACL] Unnamed ACL
         - resource_name: *
         - resource_type: TOPIC
         - resource_pattern: LITERAL
         - resource_principal: User:gitops-user
         - host: *
         - operation: DELETE
         - permission: ALLOW

Successfully applied.

....

[SUCCESS] Apply complete! Resources: 0 created, 0 updated, 7 deleted.`
devshawn commented 4 years ago

Hey,

The intended design of this project was to have a single state file per cluster, with all teams and projects contributing to it. This was to allow security & operations teams to review access control. The workflow in my organization goes like this:

The intended design without configuration will delete anything not defined in the state file. For example, if you manually create the gitops user ACLs, you still need to define them if you don't want them deleted, like this.

If you'd like to stick to a state file per-team, you can run the plan/apply process with the --no-delete flag. This will stop one team's plan/apply from deleting other team's resources. However, you will not clean up any of your own deleted topics/services either. I think the only way to have multiple state files and run with delete would be to have them all merged together before being run, something we've been talking about in #19. A current workaround would be to build a single pipeline that grabs the latest state file from each team's repository and merges them with some external YAML merging tool.

Hope that helps!

drewwells commented 3 years ago

Distributed configuration is a useful and scalable model. We have quite a few teams using Kafka. One of the main driving forces for adoption of Kafka is to have loosely coupled systems.

One way to decouple systems is to delegate ownership of configuration to individual groups. Say a topic is owned by the Logging publishing team. The topic would be created with additional metadata that other workers could then see that when fetching a topic, it is known to be owned by some group. The yaml managing that group would reconcile the topic, check who the owner is, and delete/update it as needed. This pattern would make for an excellent Kubernetes operator to manage topics. Something we are starting to look at internally.

devshawn commented 3 years ago

@drewwells - definitely agree with you here. I especially like the idea of a kubernetes operator. In that scenario, how would you manage security?

drewwells commented 3 years ago

Depends on how the yaml is stored. It could be in s3, then s3 ACL would manage access to yaml.

Our desired state would be K8s. For that example, we write an K8s Operator that is responsible for taking state from Custom Resources (CRs) and persisting that desired state to Kafka. Users would not require access to create topics in Kafka. Users would have access to a Custom Resource (CR) with the configuration yaml in it. Access to CRs would be managed by K8s RBAC. Operator would be responsible for reconciling changes to the CRs and creating/updating/deleting Kafka topics. The users here do not need access or knowledge of how to update Kafka, they only need to understand how the CR is created.

wolever commented 3 years ago

I'm having a similar issue… and while a Correct Fix would be ideal, a pragmatic fix - at least for me - would be an option to ignore users, topics, ACLs, etc. Something like:

ignore:
  topics:
    - some_other_app.*
  users:
    - alex
brunosaboia commented 3 years ago

So there is no plan to be able to split the topology into multiple files?

AntAreS24 commented 3 years ago

We've resolved that by running with "no delete" flag. In our environment, deleting something should almost never happen anyway.