Closed WalterMoar closed 2 years ago
@WalterMoar I'm currently blocked by an other platform service's task and some sysdig resource utilization issue in production cluster. I might not be able to jump in this yet, but I'll definitely update here once I get a chance!
@ShellyXueHan Do you have eta for this task? We also want to monitor some custom metrics with Sysdig. Thanks
@pwei1018 this tasks is currently postponed, un fortunately I have to take care of some other platform services with higher priority at the moment. Can't provide an estimation yet, depends on how my other tasks turn out. But I will try my best to get to this when possible. I know a few teams have asked for this feature, we will definitely look into this!
I asked this question via the Customer Success message system,
On OpenShift, we have default network security policy setup by the cluster admins that prevents any access. To scrape prometheus metrics (ie for RabbitMQ), do we need to explicitly allow sysdig? We do have the correct annotations setup and have tested the endpoint and the data comes back. The Sysdig integration is reporting no metrics. I assume it is the network security policy.
I got a reply today,
We have identified a critical issue in Sysdig agent version 12.0.3 (and only in that release) which prevents Sysdig Secure Runtime policies from triggering. If you are running agent version 12.0.3 and use Sysdig Secure Runtime policies then we advise upgrading to Sysdig agent version 12.0.4 which contains the necessary fix. For further assistance please contact your Technical Account Engineer or email support@sysdig.com Thanks Sysdig Customer Success
We would also like to scape metrics from rabbitmq and .net core processes using prometheus metrics endpoints.
According to Enable Prometheus Native Service Discovery, prom_service_discovery is enabled by default.
In agent version 11.2 and above, the
prom_service_discovery
parameter is enabled by default, which in turn enables Promscrape V2 as well by default.
The Sysdig Agent Health & Status dashboard shows that we are running agent version 11.3.0.
We've tested this in klab with ArgoCD and getting the application metrics back successfully. Currently waiting for CCM to push the sysdig changes to production clusters next week. Will test in production once ready!
sysdig service discovery is enabled in silver now! Here are some doc for you to get started: https://developer.gov.bc.ca/OpenShift-User-Guide-to-Creating-and-Using-a-Sysdig-Team-for-Monitoring#leveraging-service-discovery-to-import-application-metrics-endpoint
Let us know if this works on your apps on rocketchat
Got some feedback from teams that the metrics is not available under sysdig team scopes. Looking into this now!
Okay updates: since the service discovery metrics are coming from sysdig agent pods (not from app pods directly), app teams won't be able to get it from their sysdig team scope. This also raise another issue where even if we setup teams to have access to sysdig namespace, we won't be able to tell which app pod the metrics are coming from.
For example, if there are 2 argocd apps that both exposing metrics endpoints, sysdig will just get return one value for argocd_app_info because it relabels them to sysdig metrics instead of offering it with promQL query.
Still waiting for some feedback to see if there are any ways around this.
tried out several settings and found one that works in klab with argocd testing. Waiting for changes to push to silver and have app team help to test it out!
This is working now. Had an app team taken a look and things are as expected!
Describe the issue We are running EnterpriseDB and the pods provide a Prometheus-style /metrics endpoint on port 9187. An example metric is cnp_rdba_long_running_queries, which keeps track of when queries run longer than a specified time. We want to use Sysdig to monitor this metric, and many others. We want to avoid running/learning/supporting/upgrading our own Prometheus and Grafana installations.
Additional context This ability in Sysdig will be useful for anyone running EnterpriseDB, or anyone who want to provide their own metrics for their applications.
On 2021-08-04 @ShellyXueHan mentioned in #devops-sysdig that "To scrape app metrics directly we’ll need to enable Prometheus service discovery https://docs.sysdig.com/en/working-with-prometheus-metrics.html. At the moment it’s not enabled yet, we’ll need to do some metrics limit adjustment in silver first."
Definition of done We are able to scrape custom metrics from our EDB pods and observe them in Sysdig.