AICoE / aicoe-sre

This repository serves as a central location for services being run by the AICoE SRE team.
GNU General Public License v3.0
4 stars 12 forks source link

AICoE-CI integration #27

Closed goern closed 3 years ago

goern commented 4 years ago

Hey @all, please think about jobs you want the AICoE CI to run on this repo, yamllinting? What else?

Cc: @durandom

tumido commented 4 years ago

1) Kustomization manifests are buildable 2) Resulting Kubernetes resource validation (kubectl create --dry-run --validate or something) 3) Kustomization files maintain the same standard:

goern commented 4 years ago

oh, btw, what about: https://github.com/thoth-station/thoth-application/blob/9cbd30aed172aff33ecf5f7e21e6f558493c8694/README.md#policy-based-control-of-resources

tumido commented 4 years ago

@goern newer worked with OPA, so.. does it use any generic testsuite, you're referring to in your README? Or do you have any Thoth specific one that we can maybe take a look at? I couldn't find any...

I like the possibility to really test the manifests a lot!

goern commented 4 years ago

Ja, there is https://github.com/thoth-station/thoth-application/tree/master/policy which contains the policies we want to enforce for the thoth-application. It is just testing around, I have had no deep thoughts on it...

tumido commented 4 years ago

I like that. That implements a good portion of my comment above. :slightly_smiling_face: :+1:

anishasthana commented 4 years ago

I think @tumido hit a lot of the initial ones we'd want to be covering. ++ to what has been said so far.

durandom commented 4 years ago

@HumairAK you looked into https://github.com/app-sre/qontract-validator before we went with argo-cd. Is this something we could do to validate a PR?

HumairAK commented 4 years ago

@durandom -- It's been some time, but my guess is no, as its probably coupled with their qontract-server and not generalized. From their description in the ReadME:

This project contains the tools necessary to bundle data into the format used by qontract-server and to JSON validate it's schema.

Schema Validation would actually be something useful for the aicoe-cd repository, and I think it's worth looking into.

HumairAK commented 4 years ago

+1 to ensuring Kustomizations build successfully on all overlays.

tumido commented 4 years ago

Another cool thing would be if the bots can diff the resources (after kustomize build) from before the PR and after and check if there are new CRDs or cluster wide resources added by the PR. This way we can know if we need to ticket PSI before merging the PR or not.

HumairAK commented 4 years ago

+1 @tumido --- If this can be somehow adjustable to not only CRDs but other apigroups/kinds that we can add onto some sort of a list, that would be even better.

tumido commented 4 years ago

And what about we can take it one step further. If such clusterwide resources are found and approved, can we automate opening of a Service Now ticket to PSI?

goern commented 4 years ago

Yes we can :)

We just need some coding power to help us with that... First of all I'll turn this into a card...

tumido commented 4 years ago

Btw, I've started using the https://github.com/Agilicus/yaml_filter suggested in https://github.com/kubernetes-sigs/kustomize/issues/821#issuecomment-467089437 and It's so easy to populate the psi ticket attachments now. :smile:

kustomize build applications/argo/overlays/dh-dev-argo | yaml_filter -i CustomResourceDefinition,ClusterRole > psi_ticket.yaml
kustomize build applications/argo-events/overlays/dh-dev-argo | yaml_filter -i CustomResourceDefinition,ClusterRole >> psi_ticket.yaml

the yaml_filter is a pretty short yet clever script, can we integrate some variation of it (that may be reading the included and excluded resources from a config file instead of args?

sesheta commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sesheta commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta commented 3 years ago

@sesheta: Closing this issue.

In response to [this](https://github.com/AICoE/aicoe-sre/issues/27#issuecomment-968111127): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.