AICoE / meteor

Project meteor
GNU General Public License v3.0
7 stars 9 forks source link

add "what Thoth knows about you project" teaser feature #92

Closed goern closed 2 years ago

goern commented 3 years ago

As a User of the Meteor Web Site, I want to check a box, so that a Pull Request is created on the repository I submitted, and so that the PR give interesting insights on my project.

Acceptance Criteria

durandom commented 3 years ago

This should be "just another tekton pipeline" for the controller. I.e. it'll trigger a pipeline that creates a PR as implemented in https://github.com/thoth-station/core/issues/314

Maybe this is an opportunity to refactor the pipeline management in the operator, so that it reuses as mucho code as possible.

I.e. suppose we add thoth_pr to the CR now:

apiVersion: meteor.operate-first.cloud/v1alpha1
kind: Meteor
metadata:
  name: demo
spec:
  url: github.com/aicoe-aiops/meteor-demo
  ref: main
  ttl: 100000 # Time to live in seconds, defaults to 24h
  thoth_pr: true 

We should only "configure" a mapping of thoth_pr to a pipeline that can handle the request. I.e. in the future we can extend the controller to handle new pipelines without changing the pipeline managing code.

goern commented 3 years ago

I wonder about the 'configuration' here. As I understand the operator pattern, it is the controller's responsibility, to make sure, that thoth_pr will always be in state true, so we need to have a bit of logic in the controller, to reconcile a TektonPipelineRun until it yields true. Which pipeline to be used, and what parameters, to create the TektonPipelineRun, is hardcoded within the controller. Same for the reconciliation logic. I don't know if it is a good practice to have a ConfigMap for the controller, so that we can assign whatever Tekton Pipeline should be executed if a given attribute is set/present on the CR object.

tumido commented 3 years ago

So.. I think there are 2 areas in which we want to extend the CRD. This may be a too broad answer to your issue but I think it's necessary to outline it anyways:

  1. Let's have a secondary CRD called Shower that will deploy the shower web UI and control it for us.

  2. Shower CRD will also host meteorDefaults which will be sourced/merged into each Meteor if not specified otherwise (in the Meteor itself). (Similar to what Argo Worflow does with a configmap and workflowDefaults.) This will allow us to specify global behavior like push to quay or workspace allocations:

    apiVersion: meteor.operate-first.cloud/v1alpha1
    kind: Shower
    metadata:
    name: default
    spec:
    meteorDefaults:
     params:
       - name: reportToThoth
         value: true
     workspaces:
       - name: data  # Defines the space used by the pipelines to do work on repositories during pipeline runtime
         volumeClaimTemplate:
           spec:
             accessMode:
               - "ReadWriteOnce"
             resources:
               requests:
                 storage: 500Mi
        - name: quay  # May be pipeline specific and can pass credentials or other things to individual pipelines, like quay credentials
          secret:
            secretName: my-secret
    externalServices:
    jupyerHub:
      namespace: opf-juptyrehub
    kubeflow:
      namespace: kubeflow
    thothStation:
      host: api.thoth-station.ninja
  3. The Meteor resource would be enhanced with a pipelines property, that can specify which pipelines should be run. This may be initially specified via UI (checkboxes or what not)

    apiVersion: meteor.operate-first.cloud/v1alpha1
    kind: Meteor
    metadata:
    name: demo
    spec:
    url: github.com/aicoe-aiops/meteor-demo
    ref: main
    ttl: 100000
    pipelines:
    - jupyterhub
    - jupyterlab
    - something
    - else
    workspaces:  # either defined manually or sourced from Shower
  4. Now, the operator would use this pipelines spec and submits given pipelines

  5. Additionally, instead of defining these pipelines manually when creating the Meteor resource, we can default to a new (to be created) analyze/discovery pipeline. This special pipeline would analyze the repo and figure out which scenarios this particular repository supports. The operator will then collect outputs from this pipeline and reconcile it back to the pipelines list. This will trigger another round of reconcile - making the operator to trigger these newly added pipelines.

That way we can achieve dynamic pipeline provisioning/creation and also all the logic will be stored in Tekton Pipelines:

Future extensions of this can contain things like allowing to blacklist pipelines via the Shower resource - for all Meteors submitted via that particular Shower etc.

tumido commented 3 years ago

Naturally, we're not pursuing most of this for DevConf.US. For that event we need the base minimum functional and if it means hardcoding stuff for the time being, so be it.😉

goern commented 3 years ago
  1. sounds like a Helm with no CRD ;)

  2. I am unsure about the purpose of 2. One thing that I would see as a ConfigMap/Secret is pushing, its not the job of the controller to push anything, nor to hold a configuration for pushing.

  3. I object to having a configurable workflow engine for tekton pipelines being implemented as a controller. This sounds a lot like Argo Workflows to me.

  4. (again) sounds like a discovery task in a parent Argo Workflow kicking of Tekton Pipelines... ;)

Anyway, all these thoughts should be talked thru by more people, and we are hijacking this issue ;)

tumido commented 3 years ago
  1. sounds like a Helm with no CRD ;)

That's precisely the pattern why operators exists though. Example: Grafana operator, Openshift Gitops, Keycloak, Openshift Pipelines, Openshift Serverless, ODF... all of it is using a CR to deploy "the thing", usually a deployment+service+route+extras and maintain it.

  1. I am unsure about the purpose of 2. One thing that I would see as a ConfigMap/Secret is pushing, its not the job of the controller to push anything, nor to hold a configuration for pushing.

That's a pattern used in most of the operators as well. The CR holds a configuration passed to the reconciled resources. This is a simpification that allows for easier propagation to the descendant resources. The controller is not "pushing anything", it just passes access to this data, configs and secrets to the pipelines. https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#should-i-use-a-configmap-or-a-custom-resource

  1. I object to having a configurable workflow engine for tekton pipelines being implemented as a controller. This sounds a lot like Argo Workflows to me.

Why? It's not Argo Workflows by any means. We want the controller to initiate a certain set of Tekton Pipelines per specification, correct? I think this is the way to achieve that.

  1. (again) sounds like a discovery task in a parent Argo Workflow kicking of Tekton Pipelines... ;)

Not at all. In the case described above it can be dynamic, because we have this operator/controller. If the list of workflows grows over the lifespan of the Meteor, we can initiate just the missing pipelines, because we have now established a relation to the existing pipelines. This is not easy to implement without a controller which can reconcile. Especially in Argo Workflows which are "kick off and forget" kind of controller.

Anyway, all these thoughts should be talked thru by more people, and we are hijacking this issue ;)

Definitely. Adjustments and refinement is definitely needed. What I wanted to say is that with this approach, Thoth can have it's own pipeline running independently besides all the other pipelines.

fridex commented 3 years ago

That's precisely the pattern why operators exists though. Example: Grafana operator, Openshift Gitops, Keycloak, Openshift Pipelines, Openshift Serverless, ODF...

Operators are interesting and widely adopted piece of technology. Their use span multiple projects that use them in various ways depending on their needs. No doubts about that. All the things you mentioned can be implemented in a golang operator. The real question here is the overall cost of such a solution, maintainability, extensibility, and fitting it into a bigger picture.

The fact that application development is just small fraction of the whole application lifetime has a known ratio where application maintenance is significantly larger effort than developing an application. Maintaining golang sources might be much more expensive and prone to errors than maintaining YAML files that configure a workflow-controller. At some point the implementation of golang operator will no longer scale to 1-2 engineers and will require more human resources to keep the project up and running (also, take a look at the effort and time that is requried on your side to introduce a new feature in opposite to argo workflows alternative). The number of contributors in all the big projects you mentioned above might be a hint to support this statement. Now the question might be if project Meteor is supposed to become another Grafana, another Keycloak, or any of the projects you stated.

If you take a look at golang operators most of them reimplement the same pattern all over the place. Based on an event that happened in the cluster, do something. This can be modeled in a workflow. Moreover, Argo workflows provide you an application-agnostic workflow controller (in opposite to meteor-operator that is an application-specific workflow controller). Writing a YAML file that configures a workflow-controller to behave based on cluster events requires definitely less effort and allows others to contribute to the project without a need to know all the underlying kubernetes/openshift platform nuances used to develop a proper operator. If YAML syntax and Argo workflows are known to the contributor, he/she can contribute to other projects that are argo workflows based very easily. Compare this to the CR design and golang code changes required for meteor-operator.

all of it is using a CR to deploy "the thing", usually a deployment+service+route+extras and maintain it.

OpenShift/Kubernetes provide a way to run and manage objects in a cluster. Sooner or later applications will require a higher level of abstractions to build more sustainable and maintainable solutions. Writing operators each time there is a need to perform actions in a certain order will become mundane work that can be abstracted away so developers focus on the deliverables and provide functionality on a faster pace with a higher maintenance index for solutions delivered. The same thing was repeated in IT history in different areas (assembler code -> C code -> higher-level programming languages, OpenShift v1 -> OpenShift v2 based on shell scripts (?) -> OpenShift v3 based on kubernetes configured via YAML files -> ...).

If meteor would be based on a workflow engine, such as Argo workflows, it would be much easier to extend the project and integrate it with other backend parts we already use. Other developers might test changes easier and thus it becomes easier to contribute to the project. Moreover, we could then speak about creating an AICoE platform made out of multiple projects that cooperate well together, can be provisioned and run in a cluster based on customer/user needs. Customers then can learn from such a platform and providing a solution from us to repeatable patterns in OpenShift clusters can be offered to customers to scale up their business as well. One day, OpenShift will need a workflow engine - check how Argo workflows are involved in CNCF, the fact companies use it and we proved its scalability and usability on OpenShift already.

tumido commented 3 years ago

I think we're kidnapping this issue again, but why not in the end...

I think there are certain areas of this project in which you simply need the operator. For example:

If you take a look at golang operators most of them re-implement the same pattern all over the place. Based on an event that happened in the cluster, do something. This can be modeled in a workflow.

Modeled as Argo Workflow? Sure it can't. You don't want to trigger a workflow per event on any descendant resource in kubernetes. You don't want to be spinning up pods just to compare spec and status of a resource and update status in a way that can trigger a status update which means another even etc. Instead, you want a single minimal pod watching for those changes and quickly timely processing them. You don't want to wait for pods to start. It can be modeled as a workflow, sure, that's why you can make Ansible based operators for simplicity, but golang gives you more power we want to have.

Another aspect of this is ease of use. Reiterating on my previous arguments against Argo Workflows for Meteor, let's enumerate what you need to do to deploy Meteor via Argo:

On the other hand the operator:

In latest PRs we're moving away from having most of Meteor related things managed via golang, we're pivoting towards having everything handled in Tekton pipelines and backed by CRDs provided by Meteor allowing proper cross namespace behavior, external services discovery and so on.

Also Openshift already has a workflow engine - Openshift Pipelines.

When it comes to extending beyond Kubernetes and what not I think in that case it also makes sense in the operator case. Now we're far far away in future of Meteor, but let's continue there: Tekton is not Kuberentes only, it's meant to be deployed on the edge and elsewhere as well. The operator is just a single small golang binary that can be easily adopted to run against Tekton on edge as well. I don't think we can ever achieve that and that easily via Argo Workflows, simply because of too many moving parts.

fridex commented 3 years ago

I think there are certain areas of this project in which you simply need the operator. For example:

  • We should be able to garbage collect resources deployed by Meteor and be able to extend/modify each meteor lifespan. Using Argo Workflows we can't do that cross namespace. Simply because you can't tie owner references across namespaces, each ownerreference need to be local to the namespace.

You mentioned the cleanup cronjob below. The setup (role bindings and deployment requirements) as well as its logic is pretty much the same as in the case of meteor-operator.

  • We should be able to easily enrich existing meteors with additional pipelines without re-runs on the already processed pipelines.

This can be modeled in a workflow. One can also provide multiple workflows that can be spawned based on conditions.

If you take a look at golang operators most of them re-implement the same pattern all over the place. Based on an event that happened in the cluster, do something. This can be modeled in a workflow.

Modeled as Argo Workflow? Sure it can't. You don't want to trigger a workflow per event on any descendant resource in kubernetes. You don't want to be spinning up pods just to compare spec and status of a resource and update status in a way that can trigger a status update which means another even etc. Instead, you want a single minimal pod watching for those changes and quickly timely processing them. You don't want to wait for pods to start. It can be modeled as a workflow, sure, that's why you can make Ansible based operators for simplicity, but golang gives you more power we want to have.

Instead, you want a single minimal pod

What if I do not want? What if I want to invest into a new node in the cluster that will fit the resource needs you mentioned earlier rather than paying golang developers that will produce very resource effective golang binary when it comes to one specific use-case? Compare the cost and reusability. I'm unsure about the proposed optimizations and their cost in this case.

Another aspect of this is ease of use. Reiterating on my previous arguments against Argo Workflows for Meteor, let's enumerate what you need to do to deploy Meteor via Argo:

  • Subscribe to Openshift Pipelines
  • Deploy Argo Workflows in a compatible version via kubectl, oc, helm, kustomize, argocd, ODH. You name it. Many things can be different setup from setup.
  • Deploy Meteor Shower manifests
  • Deploy API manifests
  • Deploy a cleanup cronjob manifests
  • Deploy Workflow templates manifests

On the other hand the operator:

  • Subscribe to Openshift Pipelines and Meteor operator
  • Deploy a single CR

That's the reference to the AICoE platform. In such a case argo workflows will not be provisioned only for project meteor but for AICoE platform to run workloads for Meteor, AIOps, Thoth, ...

Also Openshift already has a workflow engine - Openshift Pipelines.

These are pipelines, not a workflow engine. They do not provide that express power for workflows and are suited for other types of applications.

When it comes to extending beyond Kubernetes and what not I think in that case it also makes sense in the operator case. Now we're far far away in future of Meteor, but let's continue there: Tekton is not Kuberentes only, it's meant to be deployed on the edge and elsewhere as well. The operator is just a single small golang binary that can be easily adopted to run against Tekton on edge as well. I don't think we can ever achieve that and that easily via Argo Workflows, simply because of too many moving parts.

I do not see where this originated. I understood OpenShift is the target platform of Meteor.

Anyway, the discussion was here the points were raised. If they do not look relevant for Meteor and integrations with other applications from Meteor point of view it might be considerably harder for us to integrate in an effective way.

durandom commented 2 years ago

closing in favor of https://github.com/AICoE/meteor/issues/92