API refactor - Githubissues

hunter commented 8 years ago

Now that we have an initial prototype built, there are a few areas on the API that can be improved.

Mentioned in #15, the ThirdPartyResource may be a better approach to integrating the API. We get the benefits of Kube API Auth along with the use of etcd - https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/design/extending-api.md

(Currently a stub... updates to come)

hunter commented 8 years ago

Updated with some info on ThirdPartyResources

darkcrux commented 8 years ago

This would probably tie in with the Controller Refactor. Also removes the need for us to start our own etcd instance.

hunter commented 8 years ago

My current understanding of ThirdPartyResources is that only the resource is stored in etcd. For the rest of associated data it should be stored outside of that (so I assume our own etcd)

darkcrux commented 8 years ago

Yes. Thinking about pipelines, stages, and build info would be stored as ThirdPartyResources and it's up to the controllers to monitor the api for changes to start builds, send notifications, etc.

hunter commented 8 years ago

Ah yeah, I hadn't considered storing more than pipelines in there.

One thing we might need to check is if TPR is enabled in GKE

darkcrux commented 8 years ago

I don't think it's enabled in our latest installation.

darkcrux commented 8 years ago

A few things I think we need to improve on the API.

Create separate models for the API and the controllers. Right now the API, controller, and the datastore. So it's pretty complex. we need to simplify them.
breakdown the models from one big structure (pipeline contains builds contains stages) into several entries in etcd. This would pattern TPR well. Idea is when TPR is available, we can just go kubectl get pipelines, kubectl get builds, kubectl get stages, etc. It's also easier on the API side when querying.
remove references of datastore and scm from the api. This is a tech debt. these should be on the controller side. also make it easier to refactor the controller afterwards.

hunter commented 8 years ago

How will users be managed in this API? Per-user pipelines (with different auth flows for different services). Can we use the K8s user (which would probably fit the TPR model)?

hunter commented 8 years ago

Would it make sense to use gRPC which seems to be the trend for newer K8s Go projects? (Helm, etcd)

darkcrux commented 8 years ago

breakdown the models from one big structure (pipeline contains builds contains stages) into several entries in etcd. This would pattern TPR well. Idea is when TPR is available, we can just go kubectl get pipelines, kubectl get builds, kubectl get stages, etc. It's also easier on the API side when querying.

This would relate to the controllers too. I think it would be better for the API to have separate data structures for the pipelines, builds, and stages. The API just creates/updates them then a separate controller watches over the changes, runs the builds, then calls the API to update on the status.

When TPR is available, it'd be easier to just go and say: kubectl get {pipeline,builds,stages} etc. Not sure if kubectl create -f ... would work too.

How will users be managed in this API? Per-user pipelines (with different auth flows for different services). Can we use the K8s user (which would probably fit the TPR model)?

we could probably use the k8s user. or if not, make the user into another TPR? something like k5s-users?

darkcrux commented 8 years ago

Still thinking about the users.

should pipelines be linked to users? eg. 2 users accessing the same repo can have separate pipelines? which means a repo can have several hooks pertaining to a pipeline?

Our current implem doesn't have a concept of user. All the pipelines shown are created by logged in users (who have admin access to the repo) but they are shared for all users (even without repo access). Everyone can run the build.

hunter commented 8 years ago

This would relate to the controllers too. I think it would be better for the API to have separate data structures for the pipelines, builds, and stages. The API just creates/updates them then a separate controller watches over the changes, runs the builds, then calls the API to update on the status.

I'm wondering if builds are something that lives as a resource. Its something thats generated as part of a running a pipeline but its never edited after its finished. Would it be better to keep it in its own data store (etcd or object store)?

hunter commented 8 years ago

should pipelines be linked to users? eg. 2 users accessing the same repo can have separate pipelines? which means a repo can have several hooks pertaining to a pipeline?

Yes, I was considering that too. As a dev can I run the same pipeline as another user for doing my own testing... I would think yes.

darkcrux commented 8 years ago

I'm wondering if builds are something that lives as a resource. Its something thats generated as part of a running a pipeline but its never edited after its finished. Would it be better to keep it in its own data store (etcd or object store)?

The builds do get edited for a short time, when a stage or all stages finishes. Same as the stages when their job is complete. Was thinking of it similar to a Job when it's running, it spins up a pod and for the moment that it's running, it is visible when we run kubectl get pods, then afterwards it gets hidden unless we use --show-all flag. Though it is never edited after it's done though. It still needs to be queried afterwards though. or removed if not needed anymore.

darkcrux commented 8 years ago

The idea of having the builds as a resource is that the API can just edit the build's definition marking a stage as READY for the controller to pick up and start running. well it works either way as TPR or just entries on a diff backend.

hunter commented 8 years ago

Hadn't considered it that way but makes some sense if there are edits going on

darkcrux commented 8 years ago

A general idea of how the new spec will look like as well as updates for the way the pipeline resources are stored in etcd (making it closer to K8S TPR).

pipeline spec

kind: Pipeline
api: extension/v1
metadata:
  name: new-pipeline
  namespace: default
  uuid: {generated}
  labels: {}
spec:
  notifs:
    - slack:
        secret: slack-creds
    - email:
        secret: email-creds
  triggers:
    - github:
        events:
          - push
          - pr
    - webhook: {}
    - quay: {}
  sources:
    - name: github-src
      scm:
        repo: github:darkcrux/obvious
        secret: repo-creds
    ...
  vars:
    test_var: testing
  stages:
    - pull:
        from: github-src
        to: src/github/
    - command:
    ...
  output:
    publish:
      to: quay.io
      secrets: quay-creds
    deploy:

note: stages can override vars and notifs.

todo:

scm branches/tags <-- how to best describe them?
output.deploy <-- how to best describe it?

pipelines

Prefix: /kontinuous/pipelines/{pipeline-uuid}/{json-data}

JSON data:

key	type	description
uuid	string	unique id for the pipeline (needed by frontend)
name	string	the pipeline friendly name
created	int	unix nano timestamp on when pipeline was created
spec	object	yaml representation of the spec (base64 encoded)
spec_src	string	if not empty, expects a file in a source to update the spec
current_build	int	id of the current build. -1 if no builds yet

pipeline - uuid map

key: /kontinuous/pipeline-uuid/{pipeline-name}

value: the uuid for the pipeline name

note: the api should use the friendly name of the pipeline but internally we should use the uuid. This is used to map the pipeline name to it's uuid.

builds

Prefix: /kotninuous/builds/{pipeline-uuid}/{build-uuid}/{json-data}

key	type	description
uuid	string	unique id for the build (needed by frontend)
pipeline-uuid	string	unique id of the parent pipeline
number	int	the build number
status	string	status of the build, can be success, fail, waiting, pending
created	int	unix nano timestamp for when the build was created
started	int	unix nano timestamp for when the build started
finished	int	unix nano timestamp for when the build completed
current_stage_uuid	string	current stage uuid
spec	object	the spec used for this build
sources.scm.type	string	the type of scm source (github,gitlab,bitbucket,generic)
sources.scm.clone_url	string	the clone url for the git source
sources.scm.branch	string	current branch to use for the build
sources.scm.commit	string	commit hash of the build
sources.scm.author	string	author of the build
sources.etc ???	???	??? other source metadata?

build - uuid map

key: /kontinuous/build-uuid/{pipeline-uuid}.{build-num}

value: the build uuid for the given build number

note: the api shoulduse the build number but internally we should use the uuid. this is used to map the pipeline name to it's uuid.

stages

Prefix: /kontinuous/stages/{pipeline-uuid}/{build-uuid}/{stage-uuid}/{json-data}

key	type	description
uuid	string	unique id for the stage
pipeline-uuid	string	unique id of the parent pipeline
build-uuid	string	unique id of the parent build
status	string	pending, success, fail, waiting, skip, skipped
created	int	unix nano timestamp for when the stage was created
started	int	unix nano timestamp for when the stage was started
finished	int	unix nano timestamp for when the stage was completed
resumed	int	unix nano timestamp for when the stage was resumed (from waiting status)
skipped	int	unix nano timestamp for when the stage was skipped
spec	object	the stage spec with the templates already processed
log_path	string	path in minio to find the logs
artifact_path	string	path in minio to find the artifacts

stage - uuid map

key: /kontinuous/stage-uuid/{pipeline-uuid}.{build-uuid}.{stage-num}

value: the stage uuid for the given stage number

AcalephStorage / kontinuous

API refactor #37

pipeline spec

pipelines

pipeline - uuid map

builds

build - uuid map

stages

stage - uuid map