bakdata / kpops

Deploy Kafka pipelines to Kubernetes
https://bakdata.github.io/kpops
MIT License
12 stars 1 forks source link

Deploy multiple pipelines #22

Closed raminqaf closed 3 months ago

raminqaf commented 1 year ago

The KPops deploy, destroy, delete, and clean should be able to take multiple pipeline files. Improves performance because files are loaded only once and helm repo commands are only run once

philipp94831 commented 1 year ago

What about not specifying a path to a pipeline.yaml file but rather a folder and KPOps discovers all pipeline.yaml files inside this folder and runs these?

disrupted commented 5 months ago

I gave this some thought and I believe this kind of functionality adds unnecessary complexity and doesn't follow the paradigm for scriptable CLIs. Especially since it can be solved quite easily with existing tools e.g.

fd pipeline.yaml pipelines/examples | xargs -L1 kpops generate

will run

kpops generate pipelines/examples/example1/pipeline.yaml
kpops generate pipelines/examples/example2/pipeline.yaml
kpops generate pipelines/examples/example3/pipeline.yaml
...

Especially in CI we already use matrix jobs to parallelize execution. Are there any other use cases where you would really need this?

philipp94831 commented 5 months ago

I think there would be multiple advantages when allowing this:

disrupted commented 5 months ago
  • caching of defaults can speed up execution

That's true, however we're talking about shaving off a couple of ms at best. A deploy operation usually takes several minutes so that's pretty negligible. Now that defaults can be distributed, I'd argue it's even less of a factor, since only the shared defaults of multiple pipelines (i.e. top level) would be cached.

In terms of performance you would lose parallelism which is a much more important speed indicator imo.

  • inter-pipeline dependencies. Currently you can only reference components from the same pipeline but referencing other pipelines would be easier to do. Especially wrt execution order and parallelization

This is a separate feature and out of scope for this ticket. This is about passing multiple pipelines into a KPOps operation.

  • this issue was actually brought back to the table because we are using matrix jobs. There were too many matrix jobs and GitHub actions is really bad at handling those. Therefore, related pipelines can be grouped by specifying a top-folder or a list of pipelines and the KPOps action can still be used without needing to write a custom bash wrapper

Hm, I know that GitHub has some weaknesses and their CI is not fully where we'd want it to be. Are there any tickets about this on their issue tracker? It seems more of a UI/UX issue of their web frontend than a KPOps concern.