Catapult 2.0 design - Githubissues

viovanov commented 4 years ago

We should discuss how v2.0 should be implemented:

goals
tools
architecture

viccuad commented 4 years ago

As a start on goals and architecture, see the current ones since October's workshop: https://github.com/SUSE/catapult/wiki/Architecture-and-implementation#goals

mudler commented 4 years ago

Few thoughts on the goal: I would like to keep it KISS to hack on. @manno had a nice idea to split the core functionalities in another language, but this needs to be well-thought from the beginning, so we don't raise a barrier for contributions. I really like that the simplicity of bash allowed us to hack on it quite easily.

I'm a bit against using tools like Ansible, Chef and a-like for this. they just build up silos and you need then to figure out support for different linux distributions (yes, they advertize they support all of the families, but that's not true).

Said that, shooting some ideas:

Proposal 1 what about a Golang core that reads "recipes" that can be written in a different way, e.g. Gherkin , Yaml . At the end we need to just feed helm and kubectl with different instructions, so it might be a good way to simplify it.

Proposal 2 We could follow the Unix-philosophy on this, and treat the problem like git does with external extensions. E.g. If we assume that a folder is just our testing environment, containing all the tools needed to deploy, test, and manage a cluster, then we can have N extensions that can interact with the core that are decoupled from it (think it like each extension is just installed as binary catapult-extension1 and then is available as catapult extension1 [...] to the user) . The core at this point just interacts with holding the metadata in a nice format (we could even also just use git annotations for this)

mudler commented 4 years ago

By the way, formally this would be a v3, there was already a rewrite to v2 quite some time ago: https://github.com/SUSE/catapult/pull/46

viccuad commented 4 years ago

By the way, formally this would be a v3, there was already a rewrite to v2 quite some time ago: #46

I agree with it being v3, but then we should change the v1 tag: https://github.com/SUSE/catapult/releases

f0rmiga commented 4 years ago

+1 on the UNIX philosophy proposed by @mudler. It should be easy to implement the catapult binary in Go and the extensions still be in another language whenever needed.

jimmykarily commented 4 years ago

I like number 2 plus @f0rmiga suggestion to be able to use whatever language for the subcommands. So introduce some convention on where the executable of the subcommand should be and then let it be whatever it wants (possibly contain all files related to a subcommand under the same directory). :+1:

viccuad commented 4 years ago

+1 on option 2, for the core: FSM, options handling, etc.

It would be nice to not lose all functionality that is already there. For that, we could move the "header" that all scripts {module,backend}/foo/*.sh operating on the buildfolder have, into the core. Eg: https://github.com/SUSE/catapult/blob/2192d5731a2cf4dc94f49abc482fd024d9efc25f/modules/tests/kubecf-test.sh#L1-L7

That way they would be plain scripts without sourcing nor using bash functions: they should work if run outside of catapult against whatever helm, kubeconfig, etc. Then we can drop the "header" later and start consuming only executables. The scripts consume env vars as options, to branch on logic. It would be nice to provide an executable that type-checks options, provides correct defaults, is able to change value of options for that specific buildfolder, etc.

mook-as commented 4 years ago

Looking at #160, it would be nice if the configs can be passed in via something other than environment variables. They are typo prone (they have to be set repeatedly, and there is no feedback if there's a mistake as they are just silently ignored).

I have no opinion on if the individual modules take configs in environment variables; I only ask that the input to the system (from the user) is in some other format.

jimmykarily commented 4 years ago

@mook-as it already supports the CONFIG env variable (https://github.com/SUSE/catapult/blob/master/include/common.sh#L9) but it's not well documented (if at all). Also it could be improved a lot. E.g. it could show a warning or an error when an unknown settings is provided (to avoid typos that would otherwise lead to ignored settings).

mudler commented 4 years ago

Looking at #160, it would be nice if the configs can be passed in via something other than environment variables. They are typo prone (they have to be set repeatedly, and there is no feedback if there's a mistake as they are just silently ignored).

I have no opinion on if the individual modules take configs in environment variables; I only ask that the input to the system (from the user) is in some other format.

It's documented here https://github.com/SUSE/catapult/wiki/Deployment-file

viccuad commented 4 years ago

I totally agree on moving away on env vars as options. I'm just proposing having it interim for the internal logic, to not rewrite all scripts. But maybe it's moot, and it's easier to rewrite or refactor using an executable that queries for options.

viccuad commented 4 years ago

I would be happy to come up with requirements that could go into a production lifecycle manager for CAP.

Another feature request in that vein: As a user/customer, I would like to have a default values.yaml for the shipped version (eg: https://github.com/cloudfoundry-incubator/kubecf/issues/711), and have catapult provide a changed-values.yaml, after selecting some specific catapult options/profiles. That changed-values.yaml would be automatically added to a local git repo with the history of that cluster. Makes it easy to check, revert, and upgrade CAP deployments.

prabalsharma commented 4 years ago

We need to first list out current problems before brainstorming on how to solve them. "What are we trying to solve?", is a very important question to avoid over engineering to reach a basic goal of helm install with different helm parameters based on scenarios. I don't mind bash.

Few of the issues with current catapult which comes to my mind:

env vars
too many defaults*.sh
similar names makes it less intuitive. https://github.com/SUSE/catapult/issues/123
too many folders with no readme, does not help https://github.com/SUSE/catapult/issues/124

There should be 1 file to handle all values. Proper restructuring of folders and intuitive file names, is important.

I am not too concerned with supporting old functionality (as long as it does not hamper us). We can just use versioning and move on to improve the tool. After all its not a product.

Our most important goal is "quick iterations during release cycle" (more important goals are already part of catapult).

There were some talks on changing definition of catapult as well. In terms of running tests. We should have that discussion as well, because limiting catapult's definition from its current one, will lead to decision on engineering effort for v3 and its worth.(@satadruroy )

proposal 2 from @mudler sounds interesting. It will be nice to see some example in our upcoming meeting regarding this or some discussion on :

What will the core look like?
What will the extensions look like?
Can we repurpose present scripts to form the core?
How will this proposal looks from CI's point of view?
How often we will have to make changes to the core?
What will define extensions? Will they be based on CAP versions or IaaS? Won't there be a lot of duplication as the changes to CAP deployments are trivial and K8 deployment is almost static.

satadruroy commented 4 years ago

With regards to running tests the broader question is around separation of concerns between CI and Catapult. It seems at this point,

Catapult is responsible for :

Setting up clusters for various backends (Kind/minikube/CaaSP on ECP/cloud.suse.de/somewhere or public cloud platforms)
Deploying SCF/KubeCF in various specified configurations (e.g. with LoadBalancer services or with Ingress Controller, internal or external DB)
Setting up DNS entries, either using magic DNS for on-prem setups or using something like external-dns for public clouds
Tearing down the cluster

CI can use the interface to ask for a) specific backend b) deploy KubeCF in a specific configuration and get a kubeconfig back. At that point, It is or it should be CI’s responsibility to run various test suites (which may differ depending on if it’s a KubeCF CI or CAP CI) and at the end of the test run ask Catapult to tear down the cluster. IoW, if setting up and running the tests is also part of Catapult now they perhaps should not be.

I’m sure I missed other stuff so feel free to add to above.

mudler commented 4 years ago

Few of the issues with current catapult which comes to my mind:
1. env vars

2. too many defaults*.sh

3. similar names makes it less intuitive. #123

4. too many folders with no readme, does not help #124
There should be 1 file to handle all values. Proper restructuring of folders and intuitive file names, is important.

:+1:

I am not too concerned with supporting old functionality (as long as it does not hamper us). We can just use versioning and move on to improve the tool. After all its not a product.

:+1: we have also a v1.0.0 tag, let's just start tagging and cleanup

Our most important goal is "quick iterations during release cycle" (more important goals are already part of catapult).

There were some talks on changing definition of catapult as well. In terms of running tests. We should have that discussion as well, because limiting catapult's definition from its current one, will lead to decision on engineering effort for v3 and its worth.(@satadruroy )

proposal 2 from @mudler sounds interesting. It will be nice to see some example in our upcoming meeting regarding this or some discussion on :
* What will the core look like?

* What will the extensions look like?

* Can we repurpose present scripts to form the core?

* How will this proposal looks from CI's point of view?

* How often we will have to make changes to the core?

* What will define extensions? Will they be based on CAP versions or IaaS? Won't there be a lot of duplication as the changes to CAP deployments are trivial and K8 deployment is almost static.

I was thinking at a Golang core which glues the various extensions (but even bash is fine, as it should have a very restricted scope). In this way we can have also a "legacy" extension that would act as a bridge from the old version to the new one, keeping intact the same featureset.

We could then iterate the development on the new version and at the same time we can run the same catapult commands as we do now.

The extension could be written in any language, as the "core" would just accept inputs - let's say yamls - that define your environment. Those are then passed by to the extensions in the required form (env? arguments?).

The only prerequisite is that the extensions has to be discoverable - or even installed - like git does to handle extensions ( e.g. catapult-cap deploy would need catapult-cap as an installable binary that is run by catapult so catapult cap deploy works)

mudler commented 4 years ago

The only prerequisite is that the extensions has to be discoverable - or even installed - like git does to handle extensions ( e.g. catapult-cap deploy would need catapult-cap as an installable binary that is run by catapult so catapult cap deploy works)

Small step into that direction, created this: https://github.com/mudler/cobra-extensions to discover extensions in git-alike way for golang cobra-based projects :slightly_smiling_face:

viccuad commented 4 years ago

I would like to use a k8s configmap as catapult's config DB.

Such Catapult configmap, in yaml or json, would be a superset of KubeCF's default values.yaml, adding the needing catapult options to revaluate the correct values.yaml used to deploy KubeCF.

Ideally this would go into KubeCF for those options that are API stable; yet not all are. Also, Catapult's superset of options include external things outside of KubeCF's helm and in kube level: domain, setup for k8s backend if needed, LB vs ingress, different pieces for other helm charts (Stratos, minibroker, future charts).

This solves and simplifies guessing the state of the system. Because of Concourse's need for every job to be on a clean cluster, there's too much repetition of catapult's config vars, which hinders refactoring, and simplification by providing sane defaults to automate on top of.

Also means that whatever cluster is given to catapult (CI, sharing clusters between devs, or taking it from one to give it to the other), if already deployed by it, would work flawlessly without mismatch of options (eirini/passwords/namespaces/options/etc). And we would have a wrapper to simplify CI and automate on top of.

viccuad commented 4 years ago

CI can use the interface to ask for a) specific backend b) deploy KubeCF in a specific configuration and get a kubeconfig back. At that point, It is or it should be CI’s responsibility to run various test suites (which may differ depending on if it’s a KubeCF CI or CAP CI) and at the end of the test run ask Catapult to tear down the cluster. IoW, if setting up and running the tests is also part of Catapult now they perhaps should not be.

I've been thinking long about this, and I believe is counterproductive.

For deploying kubecf and running tests on it, one needs to know specifics: if to use LB or Ingress, ns of kubecf, Diego or Eirini, settings of the testsuite (which can be optimally changed per k8s backend), cf passwords, etc. The same for all loosely coupled things such as Stratos, minibroker, and possible upcoming features and charts.

Taking all of that out of Catapult's responsibility doesn't make it disappear.

All of this can be implemented as scripts directly in the CI, but we give up easy debugging the CI, and we keep having the need for a developer/QA setup assistant, which allows you to have correct and complete setups even if you don't know the full picture of how all charts and options lay together, or if you inherit a cluster from a different person/CI run and you weren't the one setting it up.

I firmly believe that something like Catapult should be the CI implementation, and Concourse et al should only provide scheduling, triggering, and juggling of resources such as clusters. I believe that for complex products besides a simple application, such as ours, CI scripts become complex enough too. In that case, it really pays off to separate the CI so everyone can run them and benefit from the man-hours. Those CIs should not contain branching code directly.

Catapult's CI implementation is pretty simple (see for example how it creates the values.yaml for kubecf). The complicated task is just having a framework for correct options, depending on the kind of deployment you are using. We need to simplify the former and nail the latter.

viccuad commented 4 years ago

I would like to use a k8s configmap as catapult's config DB.

For deploying/testing KubeCF (or CAP) we need a harness. The harness consists of:

suitable kubernetes cluster(s)
helm charts to install (cf-operator, KubeCF, Stratos, Metrics, Minibroker, external-dns..)
options of those helm charts (ns, endpoints, storage, helm options)
testsuites/jobs to call and collect info from

All of those are coupled (eg: you need to know specifics about KubeCF to deploy Stratos successfully). Those coupled bits can be removed with conventions, or by querying at the kube level.

In cf-ci, the harness was specified as the implicit state set by the testing scripts run inside concourse, and options passed in helm args on those scripts. This complicated iterating the development, and there wasn't a way to recreate the harness locally.

In catapult, the harness is specified as the implicit state of the kubernetes cluster, and options passed as env vars. This allows for deploying/testing locally, but there's no discoverability and, if one doesn't run the make targets consecutively, one needs to re-specify the harness when calling the next target, both of which are a big failure.

Harness data structure

Given that helm options are normally in yaml, this is the natural one. This would allow us to save full values.yaml of KubeCF, Stratos, etc, for operating on them. Sadly yaml doesn't preseve comments when parsed, which would make merging default values.yaml and our deployment's yaml trivial. We face this problem in KubeCF too, for default.yaml and values.yaml. More info here.

Harness implementation and usage

I propose calculating the harness options by passing yaml to catapult.

Catapult computes the needed options (eg: do kubecf ingress deployment on foo ns, hence configure stratos the same way too)
Catapult saves the options somehow at kube level, maybe as configmap or in etcd. This creates the harness. I believe kube level is the more useful one: is the closest level that is not under test, yet is needed for the test and travels with it everywhere. Better than saving it locally or floating in the execution of a concourse container.
The system is ready to be deployed/tested. From now on, one can call all and any target without any more configuring. The target would read the saved config values and execute the needed steps (one can always reconfigure the harness midway of testing. but it shouldn't be needed at all).

SUSE / catapult

Catapult 2.0 design #146

Harness data structure

Harness implementation and usage