A monitor app - Githubissues

ZincWhiskers commented 5 years ago

Name of the component: Promethius What would you like to be added: Dashboard as a pre-configured dashboard for basic health on cluster that can be installed as an app

Why is this needed: If ASI's target is to reduce the number of pure infrastructure staff, we need to proactively provide developers or department leads (may not be a fully qualified DevOps expert in all things) a way of deploying a monitoring app that can be used by operations staff or development staff as a quick health status board on a cluster. This would allow a URL to be configured to display on a monitor in a operations center or project team room that provides them a default set of key indicators, by cluster.

So if a company has 1000 physical servers, or x budget in a public cloud provider, they can see how much capacity they have specifically controlled by ASI. If they break this pool of capacity/budget into sub groups/ regions/ environments in to underlay clusters (Platform Stacks) they can see how much capacity they have on each and work load placement of over lay clusters can be managed. As a lead developer that must initiate (install app) a project I can see which dev environment has capacity for my project and change it if necessary on each release. If there are default limits like AWS has, the page would show how many units remain so if the cluster I am allowed to use is almost out a service request can be sent ahead of time to the correct group to increase the capacity (ideally this would be an automated service request and fulfillment workflow, but for on premise or if in a cloud a budget increase was required it may be an asynchronous process with a time delay).

The point would be that before deploying a new app, or deploying a new release of an app that will need more capacity of some limited resource, the authorized user would be informed of some level of data to be able to have some idea if the request would be successful. I may not know the exact needs of the change but at least I know if I am getting close to a limit.

Example in Dev the release manager noticed the app consumed a list of resources, using the URL to the cluster health app, the release manager could have some idea if the app would be successfully deplorable. Detailed example, app needs x CPU units over y worker nodes with z ingress IPs with w storage of type v. If the test cluster did not meet these values a request to increase or free up capacity could be made before wasting the time to deploy and debug the failure of the deployment back to a lack of some resource that Prometheus or API call could have made known to requester.

As a demo feature it would help reduce stress in the people and would make it simpler to get infrastructure groups to agree as they would not be blamed for 100% of failed deployments. (It is like a gas gauge in a car, you can't blame your dad for the car breaking down because you ran out of gas because the gauge gave you a rough estimate of how far you could go).

this could be a website that calls things to display or it could be a pre-canned configuration of Graphana or Prometheus that can be deployed as an app picks up stack metrics and displays them as view. The ASI owners could allow the git repo to be cloned so each stack owner can customize it for their needs.

arkadijs commented 5 years ago

Kubernetes Dashboard would fit this role quite ok?

akranga commented 5 years ago

I think this is an interesting idea as allows user to store Prometheus dashboard (or exporters, alerts etc) in the form of code. It also helps us to ship Prometheus as relatively generic component.

What should be the metrics that we bake by default (and give user example to customize)?

Any examples available?

ZincWhiskers commented 5 years ago

Let me see what some prospects think and get back to everyone.

On 8/20/2019 2:07 PM, arkadijs wrote:

Kubernetes Dashboard would fit this role quite ok?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/agilestacks/components/issues/265?email_source=notifications&email_token=ALGE2MGTDMELKT2Y6E73XCLQFQ6FTA5CNFSM4INYVDW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XLBHQ#issuecomment-523153566, or mute the thread https://github.com/notifications/unsubscribe-auth/ALGE2MGJEIY6VNTJKJL4TVDQFQ6FTANCNFSM4INYVDWQ.

agilestacks / components

A monitor app #265