Closed tosazuwa closed 4 years ago
Yea sure @tosazuwa before we come up with detailed tasks let's use this ticket. @stewartshea Just wondering is there API endpoint available from Sysdig/Grafana/Prometheus to get the RAM and CPU usage?
@ShellyXueHan Have a look at what I was releasing as an early access feature here first...
The idea is that teams would get to create their own dashboards right inside sysdig. I've created a separate card for me to create a simplified "platform wide" view that they can also see via sysdig.
I think this data needs to live inside sysdig and should not he pulled and represented on any other platform or tool.
@mitovskaol , the team needs more context to this ticket
notes: removing this task list now. Refer to the updated list in description.
@mitovskaol @stewartshea Please review and update on the following action items that came out from the meeting.
General:
- [x] Shea: give access to Olena and Shelly to corresponding sysdig team
Platform wide dashboard:
- [x] Olena: confirm what metrics to include, and what access model to use
- [x] Shea: test out what's available on Sysdig, whether to create a public dashboard or export data to status page (or something else)
- [ ] Shea: follow up with the reported bug (BCDevOps/platform-services#654)
Application specific dashboard:
[x] Shea: create dashboard for SSO and RC (SSO is non license plate namespace, blocked as well?) Not blocked, this is just manually done. The operator is for general teams, but isn't a required component for all namespaces.
[ ] Shelly: explore the dashboards and get familiarize. Figure out the correct references for OpenShift objects.
[ ] Shelly: push out to the community once ready from early access mode, and provide support
@caggles since you are also poking around on Sysdig, once you feel comfortable with it, take a look on the tasks above and help yourself!
I've got a mock up of this in Sysdig that I'm looking at, but I'm trying to figure out how to reduce the scope of access for logged in users. While I don't think the information is senseitive, or we could remove enough of it so that it could possibly be public, I'm seeing if there is a way to put this easily through a logged in session.
Initial mock up looks like this:
@NickCorcoran We can port this dashboard into our own hosted Grafana instance and pair it up with SSO to govern access. I tested a mock up tonight in the lab and it seems pretty straight forward. Not sure if I want to pair it with the existing Grafana page (and have a login area), or host a dedicated one.
@stewartshea Can you please add a brief description to each of the 3 section explaining the shown metrics.
@mitovskaol yes, this was just a test up to see if I could pull in dashboards from the sysdig UI. I will be working more on this over the weekend to make it production ready
This has been rolled out to prod; 1 new dashboard in the Grafana status page, which is now behind SSO auth (GitHub for now), and 1 Sysdig based capacity dasboard;
@stewartshea , please add updates to ticket
We have additionally built a PV monitoring feature based on the team feedback, though there are early indications that the kubernetes API may deprecate this feature in the future and we may need a workaround. For now, we will close this ticket and allow teams to have access. We will use another issue to track the rollout of the Sysdig Teams operator into the OCP4 clusters as well.
Updates:
Platform level Resource usage monitoring tasks have been moved to OCP4 zenhub board already. Updating this ticket to be application specific.
Tasks:
Status:
currently block by a sysdig bug: https://github.com/BCDevOps/platform-services/issues/654
Original message:
I need to know if an application specific resource usage monitoring dashboard can be developed by a Product team itself, and if yes, if we can put together some instructions to guide the teams through this. As for the Platform wide Resource usage monitoring, Shea mentioned that this can be done fairly by getting the RAM and CPU usage for the Cluster out of Sysdig (maybe using Grafana and Prometheus) and putting into a simple chart available by a link that we can share with the broader Platform community including the executive leadership. He indicated that Cailey and Shelly should be able to build the dashboard themselves. So can we have a card created in ZenHub for this work. I would assign it the second top priority after the documentation cleanup