BCDevOps / platform-services

Collection of platform related tools and configurations
Apache License 2.0
13 stars 29 forks source link

Investigate PV monitoring #710

Closed stewartshea closed 4 years ago

stewartshea commented 4 years ago

As a platform user, I would like to be able to monitor the free space of my persistent volumes in OpenShift. Is this possible with sysdig.

Definition of Done Investigate the Prometheus metrics exposed to sysdig (or other hots metrics), in order to determine if PV usage can be reported on. Document the findings if this is or isn't possible.

Additional context It's clear that filesystem usage can be reported on via the sysdig agents, but aren't clear whether PV's are reported on.

stewartshea commented 4 years ago

We are able to get the kubelet prom metrics into the dashboard, but I suspect it's coming in as a host metric which may be hard to apply a scope to.

image

stewartshea commented 4 years ago

In addition to the above concern, we also need to determine how to better filter out specific prom metrics as to not saturate or hit the configured limit.

stewartshea commented 4 years ago

I've reached out to Sysdig about how to best handle scoping these resources down to individual teams. The thinking right now is that each app team requires 2 sysdig teams, one scoped to the kubernetes pods/deployments/etc and one to their PV's. It's not ideal, but it should work.

dleard commented 4 years ago

Hi @stewartshea, do you have a timeline estimate for when we will be able to monitor PV's with sysdig? I have come up with a stop-gap solution for my team using an airflow DAG and we're trying to decide if it's worth it to implement it in the meantime.

stewartshea commented 4 years ago

@dleard it's likely a couple weeks before I can get this feature in place, barring any unforeseen blockers

dleard commented 4 years ago

Thanks! I'll pass that info along to my team.

stewartshea commented 4 years ago

This feature build is in-progress. I'm just working out some new processes to use the new sdc-cli tool rather than the API since the API for new dashboards has been deprecated by Sysdig.

stewartshea commented 4 years ago

Here is a mock up of the storage dashboard; I'm still working on the operator changes to build this upon team creation, followed by an update of the docs.

image

dleard commented 4 years ago

looks great, thanks for the updates & your work on this @stewartshea!

stewartshea commented 4 years ago

This feature has been released in https://github.com/BCDevOps/platform-services/pull/719

@dleard have a look and let me know what questions come up (in rocketchat) and we can fix up the docs to make sure they support new users of this feature.

stewartshea commented 4 years ago

With this feature released, I am going to close this issue. I will open another issue specific to the netapp-file-standard reporting issues.