Closed StevenBarre closed 2 years ago
Start drafting doc here: https://github.com/bcgov/platform-developer-docs/pull/145
Checking Get metrics in Sysdig
.
Re: Get metrics in Sysdig
, I will summarize Sending StatsD Metrics doc, which has been used for statsd_MCS_XXXX
custom metrics in the Sysdig. -- It is collected by MCS's nagios monitoring and sent to Sysdig.
@ShellyXueHan can you help with getting sysdig to scrape custom prometheus endpoints?
Thanks to Steven, it looks like we just need to have an annotation below.
prometheus.io/scrape=true
.. and Sysdig agent has already the Prometheus setting in its configmap:
From CCM's template cm-sysdig-agent.yaml.j2
<...>
### Prometheus
# enable that the metrics being scrapped is mapped to the application container instead of sysdig agent container
promscrape_fastproto: true
prometheus:
enabled: true
prom_service_discovery: true
interval: 30
log_errors: true
# max_metrics: 3000 (defualt set to 8000)
histograms: false
@tmorik were you able to get the metrics available from sysdig? anything i can help with still?
@ShellyXueHan, Yes, I got metrics in sysdig like below;
Using our openshift-bcgov-perfmon
namespace. I added prometheus.io/scrape: true
annotation to the pod. Then sysdig is stating scraping the metrics which that pods is collecting.
KLAB/openshift-bcgov-perfmon ~ $ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
perfmon-5576c95b44-6nd5j 1/1 Running 0 97m 10.97.13.65 mcs-klab-app-03.dmz <none> <none>
KLAB/openshift-bcgov-perfmon ~ $ oc rsh perfmon-5576c95b44-6nd5j
(app-root) sh-4.4$ curl http://localhost:8000/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 66.0
<...>
response_size_bytes{metric="REQUEST_SIZE",url="https://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/test.txt"} 270.0
response_size_bytes{metric="SIZE_DOWNLOAD_T",url="https://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/test.txt"} 5.24288e+06
# HELP response_count_total Response by code
# TYPE response_count_total counter
response_count_total{code="200",url="http://nginx-openshift-bcgov-nagios.apps.klab.devops.gov.bc.ca/"} 192.0
response_count_total{code="200",url="http://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/"} 192.0
response_count_total{code="200",url="https://nginx-openshift-bcgov-nagios.apps.klab.devops.gov.bc.ca/"} 192.0
response_count_total{code="200",url="https://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/"} 192.0
response_count_total{code="200",url="https://status.developer.gov.bc.ca/"} 192.0
response_count_total{code="200",url="http://nginx-openshift-bcgov-nagios.apps.klab.devops.gov.bc.ca/test.txt"} 192.0
response_count_total{code="200",url="http://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/test.txt"} 192.0
response_count_total{code="200",url="https://nginx-openshift-bcgov-nagios.apps.klab.devops.gov.bc.ca/test.txt"} 192.0
response_count_total{code="200",url="https://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/test.txt"} 192.0
# HELP response_count_created Response by code
# TYPE response_count_created gauge
response_count_created{code="200",url="http://nginx-openshift-bcgov-nagios.apps.klab.devops.gov.bc.ca/"} 1.6678475583501863e+09
response_count_created{code="200",url="http://nginx-openshift-bcgov-nagios.apps.clab.devops.gov.bc.ca/"} 1.6678475583546753e+09
<...>
On the sysdig webconsole;
So, I think that annotation is working as described in our clusters!
I would like to know if it's possible to set up alerts based on these metrics, such as if response_count_created
is crossed over X
then send a warning alert to hoge@blah.com
, etc, using sysdig.
totally doable! You'll need to create a dashboard with the metrics there, then you can setup an alert for it. Since this is for our team, i'd recommend you to use the Platform Experience
sysdig team to create the dashboard.
Here are more details on how-to:
Great! Thank you! I will try those and add some notes about that.
Sysdig notification has been set up and testing at https://app.sysdigcloud.com/#/alerts/rules?alertId=12883565&direction=asc&sortBy=name
Next I will look for Granting users permission to monitor user-defined projects
In OCP 4.10, Alert routing for user-defined projects is still a Technology Preview.
For OCP4.11, it's not a TP .
It is possible to set up Alertmanager rule for a user-defined projects so that the granted user(s) by the monitoring-rules-edit
role can create, modify, and deleting PrometheusRule custom resources for their project. Thus they can see alerts in the Openshift WebConsole as we (cluster-admins) are doing.
However, it's still a TP in OCP4.10, AND users already can easily set up Sysdig alerts for their pods. Probably this is not necessary.
Agreed, not necessary while in TP. I think it would be good to have options once we get to 4.11, but we can revisit documenting alert routing in Feb.
PRed doc (https://github.com/bcgov/platform-developer-docs/pull/145#pullrequestreview-1181226609) has been merged. I will close this ticket.
Describe the issue Convert my slide deck from the recent Community Meetup into a docs page in beta-docs
Additional context See my slide deck
How does this benefit the users of our platform? Demonstrate how to add instrumentation to apps for custom metrics
Definition of done Page published on https://beta-docs.developer.gov.bc.ca/