Closed raminqaf closed 3 months ago
What about not specifying a path to a pipeline.yaml file but rather a folder and KPOps discovers all pipeline.yaml files inside this folder and runs these?
I gave this some thought and I believe this kind of functionality adds unnecessary complexity and doesn't follow the paradigm for scriptable CLIs. Especially since it can be solved quite easily with existing tools e.g.
fd pipeline.yaml pipelines/examples | xargs -L1 kpops generate
will run
kpops generate pipelines/examples/example1/pipeline.yaml
kpops generate pipelines/examples/example2/pipeline.yaml
kpops generate pipelines/examples/example3/pipeline.yaml
...
Especially in CI we already use matrix jobs to parallelize execution. Are there any other use cases where you would really need this?
I think there would be multiple advantages when allowing this:
- caching of defaults can speed up execution
That's true, however we're talking about shaving off a couple of ms at best. A deploy operation usually takes several minutes so that's pretty negligible. Now that defaults can be distributed, I'd argue it's even less of a factor, since only the shared defaults of multiple pipelines (i.e. top level) would be cached.
In terms of performance you would lose parallelism which is a much more important speed indicator imo.
- inter-pipeline dependencies. Currently you can only reference components from the same pipeline but referencing other pipelines would be easier to do. Especially wrt execution order and parallelization
This is a separate feature and out of scope for this ticket. This is about passing multiple pipelines into a KPOps operation.
- this issue was actually brought back to the table because we are using matrix jobs. There were too many matrix jobs and GitHub actions is really bad at handling those. Therefore, related pipelines can be grouped by specifying a top-folder or a list of pipelines and the KPOps action can still be used without needing to write a custom bash wrapper
Hm, I know that GitHub has some weaknesses and their CI is not fully where we'd want it to be. Are there any tickets about this on their issue tracker? It seems more of a UI/UX issue of their web frontend than a KPOps concern.
The KPops deploy, destroy, delete, and clean should be able to take multiple pipeline files. Improves performance because files are loaded only once and helm repo commands are only run once