AcalephStorage / kontinuous

The Kubernetes Continuous Integration & Delivery Platform (CI/CD) :arrows_counterclockwise:
Apache License 2.0
116 stars 9 forks source link

API refactor #37

Open hunter opened 8 years ago

hunter commented 8 years ago

Now that we have an initial prototype built, there are a few areas on the API that can be improved.

Mentioned in #15, the ThirdPartyResource may be a better approach to integrating the API. We get the benefits of Kube API Auth along with the use of etcd - https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/design/extending-api.md

(Currently a stub... updates to come)

hunter commented 8 years ago

Updated with some info on ThirdPartyResources

darkcrux commented 8 years ago

This would probably tie in with the Controller Refactor. Also removes the need for us to start our own etcd instance.

hunter commented 8 years ago

My current understanding of ThirdPartyResources is that only the resource is stored in etcd. For the rest of associated data it should be stored outside of that (so I assume our own etcd)

darkcrux commented 8 years ago

Yes. Thinking about pipelines, stages, and build info would be stored as ThirdPartyResources and it's up to the controllers to monitor the api for changes to start builds, send notifications, etc.

hunter commented 8 years ago

Ah yeah, I hadn't considered storing more than pipelines in there.

One thing we might need to check is if TPR is enabled in GKE

darkcrux commented 8 years ago

I don't think it's enabled in our latest installation.

darkcrux commented 8 years ago

A few things I think we need to improve on the API.

  1. Create separate models for the API and the controllers. Right now the API, controller, and the datastore. So it's pretty complex. we need to simplify them.
  2. breakdown the models from one big structure (pipeline contains builds contains stages) into several entries in etcd. This would pattern TPR well. Idea is when TPR is available, we can just go kubectl get pipelines, kubectl get builds, kubectl get stages, etc. It's also easier on the API side when querying.
  3. remove references of datastore and scm from the api. This is a tech debt. these should be on the controller side. also make it easier to refactor the controller afterwards.
hunter commented 8 years ago

How will users be managed in this API? Per-user pipelines (with different auth flows for different services). Can we use the K8s user (which would probably fit the TPR model)?

hunter commented 8 years ago

Would it make sense to use gRPC which seems to be the trend for newer K8s Go projects? (Helm, etcd)

darkcrux commented 8 years ago
  1. breakdown the models from one big structure (pipeline contains builds contains stages) into several entries in etcd. This would pattern TPR well. Idea is when TPR is available, we can just go kubectl get pipelines, kubectl get builds, kubectl get stages, etc. It's also easier on the API side when querying.

This would relate to the controllers too. I think it would be better for the API to have separate data structures for the pipelines, builds, and stages. The API just creates/updates them then a separate controller watches over the changes, runs the builds, then calls the API to update on the status.

When TPR is available, it'd be easier to just go and say: kubectl get {pipeline,builds,stages} etc. Not sure if kubectl create -f ... would work too.

How will users be managed in this API? Per-user pipelines (with different auth flows for different services). Can we use the K8s user (which would probably fit the TPR model)?

we could probably use the k8s user. or if not, make the user into another TPR? something like k5s-users?

darkcrux commented 8 years ago

Still thinking about the users.

should pipelines be linked to users? eg. 2 users accessing the same repo can have separate pipelines? which means a repo can have several hooks pertaining to a pipeline?

Our current implem doesn't have a concept of user. All the pipelines shown are created by logged in users (who have admin access to the repo) but they are shared for all users (even without repo access). Everyone can run the build.

hunter commented 8 years ago

This would relate to the controllers too. I think it would be better for the API to have separate data structures for the pipelines, builds, and stages. The API just creates/updates them then a separate controller watches over the changes, runs the builds, then calls the API to update on the status.

I'm wondering if builds are something that lives as a resource. Its something thats generated as part of a running a pipeline but its never edited after its finished. Would it be better to keep it in its own data store (etcd or object store)?

hunter commented 8 years ago

should pipelines be linked to users? eg. 2 users accessing the same repo can have separate pipelines? which means a repo can have several hooks pertaining to a pipeline?

Yes, I was considering that too. As a dev can I run the same pipeline as another user for doing my own testing... I would think yes.

darkcrux commented 8 years ago

I'm wondering if builds are something that lives as a resource. Its something thats generated as part of a running a pipeline but its never edited after its finished. Would it be better to keep it in its own data store (etcd or object store)?

The builds do get edited for a short time, when a stage or all stages finishes. Same as the stages when their job is complete. Was thinking of it similar to a Job when it's running, it spins up a pod and for the moment that it's running, it is visible when we run kubectl get pods, then afterwards it gets hidden unless we use --show-all flag. Though it is never edited after it's done though. It still needs to be queried afterwards though. or removed if not needed anymore.

darkcrux commented 8 years ago

The idea of having the builds as a resource is that the API can just edit the build's definition marking a stage as READY for the controller to pick up and start running. well it works either way as TPR or just entries on a diff backend.

hunter commented 8 years ago

Hadn't considered it that way but makes some sense if there are edits going on

darkcrux commented 8 years ago

A general idea of how the new spec will look like as well as updates for the way the pipeline resources are stored in etcd (making it closer to K8S TPR).

pipeline spec

kind: Pipeline
api: extension/v1
metadata:
  name: new-pipeline
  namespace: default
  uuid: {generated}
  labels: {}
spec:
  notifs:
    - slack:
        secret: slack-creds
    - email:
        secret: email-creds
  triggers:
    - github:
        events:
          - push
          - pr
    - webhook: {}
    - quay: {}
  sources:
    - name: github-src
      scm:
        repo: github:darkcrux/obvious
        secret: repo-creds
    ...
  vars:
    test_var: testing
  stages:
    - pull:
        from: github-src
        to: src/github/
    - command:
    ...
  output:
    publish:
      to: quay.io
      secrets: quay-creds
    deploy:

note: stages can override vars and notifs.

todo:

pipelines

Prefix: /kontinuous/pipelines/{pipeline-uuid}/{json-data}

JSON data:

key type description
uuid string unique id for the pipeline (needed by frontend)
name string the pipeline friendly name
created int unix nano timestamp on when pipeline was created
spec object yaml representation of the spec (base64 encoded)
spec_src string if not empty, expects a file in a source to update the spec
current_build int id of the current build. -1 if no builds yet

pipeline - uuid map

key: /kontinuous/pipeline-uuid/{pipeline-name}

value: the uuid for the pipeline name

note: the api should use the friendly name of the pipeline but internally we should use the uuid. This is used to map the pipeline name to it's uuid.

builds

Prefix: /kotninuous/builds/{pipeline-uuid}/{build-uuid}/{json-data}

key type description
uuid string unique id for the build (needed by frontend)
pipeline-uuid string unique id of the parent pipeline
number int the build number
status string status of the build, can be success, fail, waiting, pending
created int unix nano timestamp for when the build was created
started int unix nano timestamp for when the build started
finished int unix nano timestamp for when the build completed
current_stage_uuid string current stage uuid
spec object the spec used for this build
sources.scm.type string the type of scm source (github,gitlab,bitbucket,generic)
sources.scm.clone_url string the clone url for the git source
sources.scm.branch string current branch to use for the build
sources.scm.commit string commit hash of the build
sources.scm.author string author of the build
sources.etc ??? ??? ??? other source metadata?

build - uuid map

key: /kontinuous/build-uuid/{pipeline-uuid}.{build-num}

value: the build uuid for the given build number

note: the api shoulduse the build number but internally we should use the uuid. this is used to map the pipeline name to it's uuid.

stages

Prefix: /kontinuous/stages/{pipeline-uuid}/{build-uuid}/{stage-uuid}/{json-data}

key type description
uuid string unique id for the stage
pipeline-uuid string unique id of the parent pipeline
build-uuid string unique id of the parent build
status string pending, success, fail, waiting, skip, skipped
created int unix nano timestamp for when the stage was created
started int unix nano timestamp for when the stage was started
finished int unix nano timestamp for when the stage was completed
resumed int unix nano timestamp for when the stage was resumed (from waiting status)
skipped int unix nano timestamp for when the stage was skipped
spec object the stage spec with the templates already processed
log_path string path in minio to find the logs
artifact_path string path in minio to find the artifacts

stage - uuid map

key: /kontinuous/stage-uuid/{pipeline-uuid}.{build-uuid}.{stage-num}

value: the stage uuid for the given stage number