cloud-native-toolkit / planning

The is the planning repo to manage the cross project Epics and Issues. Tasks and Bugs
3 stars 1 forks source link

Carlos can see a dashboard that shows how a team is doing against key metrics so that he can understand where there might be areas of improvement #50

Open seansund opened 4 years ago

seansund commented 4 years ago

The Accelerate book and the State Of DevOps report from which it was derived identifies four key metrics that are indicative of high performing teams:

seansund commented 4 years ago

Hygieia looks like a good tool to deliver a lot of this value (and more)

seansund commented 4 years ago

Start with http://hygieia.github.io/Hygieia/builddocker.html to prototype locally. Hygieia is customized to an environment so we may end up baking this into Terraform where we collect the particulars of the environment (Git host, Ticket system (e.g. Jira, Trello, etc)).

lsteck commented 4 years ago

Docker Compose doesn't work: https://github.com/Hygieia/Hygieia/issues/3216

Switched to use Starter Kit: https://github.com/Hygieia/hygieia-starter-kit This is a single docker image that contains:

lsteck commented 4 years ago

Got the Starter Kit to work locally, there is an issue with the Sonar Collector. It doesn't support Sonar Qube version 8.2, it works fine with version 6.7.

Error with 8.2 2020-03-18 17:49:19,967 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - Running Collector: Sonar 2020-03-18 17:49:19,979 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - ----------------------------------- 2020-03-18 17:49:19,980 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - http://192.168.0.32:9000 2020-03-18 17:49:19,981 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - ----------------------------------- 2020-03-18 17:49:20,634 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - Fetched projects 1 0s 2020-03-18 17:49:20,638 [taskScheduler-1] ERROR o.s.s.s.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task. java.lang.NullPointerException: null at com.capitalone.dashboard.model.SonarProject.equals(SonarProject.java:38) ~[sonar-codequality-collector.jar!/:3.1.0]

lsteck commented 4 years ago

I'm going to work on deploying the components individually as services.

There are some that are making progress

https://github.com/Hygieia/Hygieia/issues/3224

lsteck commented 4 years ago

FYI here is the current list of collectors. https://hygieia.github.io/Hygieia/collectors.html

No support the Tekton or ArgoCD.

lsteck commented 4 years ago

Notes on where I left off before getting pulled to look at other tasks.

While the Starter Kit is fine for running locally it would not work in Kubernetes environment.

The Docker-compose does not work.
https://github.com/Hygieia/Hygieia/issues/3216

Likewise the build docker image commands don't work. It uses an old framework from Spotify that isn't supported anymore. https://github.com/spotify/docker-maven-plugin

All projects have a Dockerfile which uses a /docker/properties-builder.sh script to convert environment variables into the properties files and passes that into the spring jar so using them works fine.

MongoDB: I just pulled the latest version of Mongo from docker hub and it works. FYI, created a startup script that creates the dashboarduser in both the admin and dashboarddb databases because just having it on admin database didn't work. https://github.com/Hygieia/Hygieia/issues/2877

Sonar The collector/dashboard does not work with SonarQube 8.2, see comment above. https://github.com/Hygieia/hygieia-starter-kit/issues/8#issuecomment-601244202

FYI there is issues on how to get it working on Kubernetes https://github.com/Hygieia/Hygieia/issues/3224

seansund commented 4 years ago

We decided that Hygieia is more work to configure than the value we will get from it. The code seems to be not well supported and the installation model is pre-cloud-native

mjperrins commented 4 years ago

I agree, the value of collecting the Accelerate metrics is still important, I was going to resurrect the original code for display the base MTTR and Build time UX we had in React and link that into the pipeline and add that UX to the Dashboard, as the dashboard moves from a tools launching functionality to a something that could give metrics a team. The design would use the work @seansund started for collecting the data in a more structured manner sending data from the pipeline to an in cluster Mongo (with storage) and then presenting the metrics in the UX below, it would also allow detail navigation to the Git Repo, Artifact, Image, Code Coverage and health. This could be then added into Tekton Tasks or new Tasks added.

image

This is a logical next step to build on top of the base pipeline delivery

csantanapr commented 4 years ago

Take a look at Tekton Events, it already generate some events that can be collected, then this can be paired with knative-eventing to convert them into CNCF CloundEvents, then from the broker multiple trigger subscribers can attach and populate a DB like mongo, or send a slack message, etc.. https://github.com/tektoncd/pipeline/blob/master/docs/events.md

Using a generic way to generate "DevOps" events, with some sane schema, then a subscriber action can convert those events into specific system, for example if User is using Tekton on IBM Cloud and use DevOps insights, instead of hard coding the cli call ibmcloud doi publishtestrecord directly in their Tekton Task, it can be a generic "DevOps" CloudEvent, and then a subscriber can hanle the event.

https://cloud.ibm.com/docs/ContinuousDelivery?topic=ContinuousDelivery-publishing-test-data

seansund commented 4 years ago

@csantanapr I agree. Would like to see this done in a more extensible way to emit and collect events. Right now most solutions are very platform/tool specific

csantanapr commented 4 years ago

I mean, you can even have a CloudEvent trigger another Tekton Task to handle the call to the devops system like this tasks here https://github.com/open-toolchain/tekton-catalog/blob/tkn_v1beta1/devops-insights/README.md

csantanapr commented 4 years ago

The concept of using Tekton TaskRun as Serverless Function :-)

lsteck commented 4 years ago

Need time to investigate if Hygieia is a good solution

lsteck commented 4 years ago

I wondering if we need to look for a different solution. Like some sort of eventing solution with a dashboard.

mjperrins commented 4 years ago

I would park this story, the new AI Ops cartridge from IBM Hybrid Cloud is going to major on this functionality it will plug into a common SDLC common tools and aggregate the key metrics. This would enable IBMers to demonstrate a story, there is also a lot of activity in this space with commercial offerings