dotnet / dotnet-monitor

This repository contains the source code for .NET Monitor - a tool that allows you to gather diagnostic data from running applications using HTTP endpoints
MIT License
638 stars 112 forks source link

Runtime metrics not generated #1241

Open WeihanLi opened 2 years ago

WeihanLi commented 2 years ago

Description

When I try to use the dotnet-monitor as a sidecar container, the runtime metrics is not generated

I could only get the following metrics:

image

It seemed only have microsoftaspnetcorehosting related provider? Not sure if I'm wrong with some config

Configuration

deployment yaml https://github.com/WeihanLi/SparkTodo/blob/82ba4ac7493afcb476d096cf11f482e9a297003a/sparktodo-api-k8s-deploy.yaml#L20

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sparktodo-api
  labels:
    app: sparktodo-api
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: sparktodo-api
  minReadySeconds: 0
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "52323"
      labels:
        app: sparktodo-api

    spec:
      containers:
        - name: sparktodo-api
          image: weihanli/sparktodo-api:latest
          imagePullPolicy: Always
          resources:
            requests:
              memory: "64Mi"
              cpu: "20m"
            limits:
              memory: "128Mi"
              cpu: "50m"
          env:
          - name: ASPNETCORE_URLS
            value: http://+:80
          - name: DOTNET_DiagnosticPorts
            value: /diag/port
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 60
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 60
            periodSeconds: 30
          volumeMounts:
          - mountPath: /diag
            name: diagvol
          - mountPath: /dumps
            name: dumpsvol
        - name: monitor
          image: mcr.microsoft.com/dotnet/monitor:6.0.0
          # DO NOT use the --no-auth argument for deployments in production
          args: [ "--no-auth" ]
          imagePullPolicy: Always
          ports:
            - containerPort: 52323
          env:
          - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
            value: Listen
          - name: DOTNETMONITOR_DiagnosticPort__EndpointName
            value: /diag/port
          - name: DOTNETMONITOR_Storage__DumpTempFolder
            value: /dumps
          - name: DOTNETMONITOR_Urls
            value: "http://*:52323"
          volumeMounts:
          - mountPath: /diag
            name: diagvol
          - mountPath: /dumps
            name: dumpsvol
          resources:
            requests:
              cpu: 50m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 256Mi
          securityContext:
            capabilities:
              add: ["SYS_PTRACE"]
      volumes:
      - name: diagvol
        emptyDir: {}
      - name: dumpsvol
        emptyDir: {}
WeihanLi commented 2 years ago

I try to download the trace data, and there's runtime counters data in the trace image

johan-bjerling commented 2 years ago

I'm experiencing the same issue when I run dotnet-monitor as a sidecar in AKS.

I'm able add other providers, e.g. System.Net.Http and Microsoft-AspNetCore-Server-Kestrel and I get metrics for them. I even tried to add System.Runtime manually, both with IncludeDefaultProviders set to true and to false, but to no avail.

jander-msft commented 2 years ago

Hey Folks, make sure that you are using the latest dotnet-monitor version (this is easily done if you are using a floating tag e.g. 6 or 6-alpine, you may need to configure your deployment to always pull the image in order to get a newer version for these floating tags). We released an update in August that may contain a fix for this issue:

johan-bjerling commented 2 years ago

Hey Folks, make sure that you are using the latest dotnet-monitor version (this is easily done if you are using a floating tag e.g. 6 or 6-alpine, you may need to configure your deployment to always pull the image in order to get a newer version for these floating tags). We released an update in August that may contain a fix for this issue:

@jander-msft Just tried this again with the latest version of dotnet-monitor running in a side-car in AKS with default providers. But I can't seem to get any metrics other than these I'm afraid:

/app # dotnet-monitor --version
6.2.2+8abeb94c15ee4175d7078a3f768ed1ef15032bd8
/app # curl http://localhost:52323/metrics
# HELP microsoftaspnetcorehosting_requests_per_second Request Rate
# TYPE microsoftaspnetcorehosting_requests_per_second gauge
microsoftaspnetcorehosting_requests_per_second 6 1663747451248
microsoftaspnetcorehosting_requests_per_second 8 1663747456248
microsoftaspnetcorehosting_requests_per_second 7 1663747461248
# HELP microsoftaspnetcorehosting_total_requests Total Requests
# TYPE microsoftaspnetcorehosting_total_requests gauge
microsoftaspnetcorehosting_total_requests 307 1663747451248
microsoftaspnetcorehosting_total_requests 315 1663747456248
microsoftaspnetcorehosting_total_requests 322 1663747461248
# HELP microsoftaspnetcorehosting_current_requests Current Requests
# TYPE microsoftaspnetcorehosting_current_requests gauge
microsoftaspnetcorehosting_current_requests 0 1663747451248
microsoftaspnetcorehosting_current_requests 0 1663747456248
microsoftaspnetcorehosting_current_requests 0 1663747461248
# HELP microsoftaspnetcorehosting_failed_requests Failed Requests
# TYPE microsoftaspnetcorehosting_failed_requests gauge
microsoftaspnetcorehosting_failed_requests 1 1663747451248
microsoftaspnetcorehosting_failed_requests 1 1663747456248
microsoftaspnetcorehosting_failed_requests 1 1663747461248
xsoheilalizadeh commented 2 years ago

@johan-bjerling I'm using following configuration and it works for me.

- name: monitor
  securityContext:
    runAsUser: 1000
  image: mcr.microsoft.com/dotnet/monitor:6-alpine
  args: [ "--no-auth" ]
  imagePullPolicy: IfNotPresent
  ports:
    - name: metrics
      containerPort: 52323
  env:
    - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
      value: "Listen"
    - name: DOTNETMONITOR_DiagnosticPort__EndpointName
      value: "/diag/port.sock"
    - name: DOTNETMONITOR_Storage__DumpTempFolder
      value: "/diag/dumps"
    - name: DOTNETMONITOR_Urls
      value: http://+:52323
    - name: DefaultProcess__Filters__0__Key 
      value: "ProcessName"
    - name: DefaultProcess__Filters__0__Value 
      value: "dotnet"
    - name: DotnetMonitor_Metrics__Providers__0__ProviderName
      value: "Microsoft-AspNetCore-Server-Kestrel"     
    - name: DotnetMonitor_Metrics__Providers__1__ProviderName
      value: "Microsoft.AspNetCore.Http.Connections"    
    - name: DotnetMonitor_Metrics__Providers__2__ProviderName
      value: "System.Net.Http"    
    - name: DotnetMonitor_Metrics__Providers__3__ProviderName
      value: "System.Net.NameResolution"    
    - name: DotnetMonitor_Metrics__Providers__4__ProviderName
      value: "System.Net.Security"    
    - name: DotnetMonitor_Metrics__Providers__5__ProviderName
      value: "System.Net.Sockets"    
  volumeMounts:
    - mountPath: /diag
      name: diagvol
  resources:
    requests:
      memory: "32Mi"
      cpu: "50m"
    limits:
      memory: "256Mi"
      cpu: "250m"    
johan-bjerling commented 2 years ago

@xsoheilalizadeh Yep, I can get all of those metrics to work for me as well. Unfortunately neither adding "System.Runtime" explicitly or relying on it implicitly being added via the default providers seem to work.

jander-msft commented 2 years ago

@xsoheilalizadeh Yep, I can get all of those metrics to work for me as well. Unfortunately neither adding "System.Runtime" explicitly or relying on it implicitly being added via the default providers seem to work.

Could you provide the full configuration of your dotnet-monitor instance? If you exec into the dotnet-monitor container and execute dotnet-monitor config show, that would provide the configuration as understood by the running instance.

2730 may be related

johan-bjerling commented 2 years ago

@jander-msft sure, here's the full config:

{
  "urls": "http://localhost:52323",
  "Kestrel": ":NOT PRESENT:",
  "Templates": ":NOT PRESENT:",
  "CollectionRuleDefaults": ":NOT PRESENT:",
  "GlobalCounter": {
    "IntervalSeconds": "5"
  },
  "CollectionRules": ":NOT PRESENT:",
  "CorsConfiguration": ":NOT PRESENT:",
  "DiagnosticPort": {
    "ConnectionMode": "Listen",
    "DeleteEndpointOnStartup": "true",
    "EndpointName": "/diag/port.sock"
  },
  "Metrics": {
    "Enabled": "True",
    "Endpoints": "http://localhost:52325",
    "IncludeDefaultProviders": "True",
    "MetricCount": "3",
    "Providers": [
      {
        "ProviderName": "Microsoft-AspNetCore-Server-Kestrel"
      },
      {
        "ProviderName": "Microsoft.AspNetCore.Http.Connections"
      },
      {
        "ProviderName": "System.Net.Http"
      },
      {
        "ProviderName": "System.Net.NameResolution"
      },
      {
        "ProviderName": "System.Net.Security"
      },
      {
        "ProviderName": "System.Net.Sockets"
      },
      {
        "ProviderName": "System.Runtime"
      }
    ]
  },
  "Storage": {
    "DumpTempFolder": "/diag/dumps"
  },
  "DefaultProcess": {
    "Filters": [
      {
        "Key": "ProcessId",
        "ProcessName": "dotnet",
        "Value": "1"
      }
    ]
  },
  "Logging": {
    "Console": {
      "FormatterName": "json",
      "FormatterOptions": {
        "IncludeScopes": "True",
        "TimestampFormat": "yyyy-MM-ddTHH:mm:ss.fffffffZ",
        "UseUtcTimestamp": "true"
      }
    },
    "EventLog": {
      "LogLevel": {
        "Default": "Information",
        "Microsoft": "Warning",
        "Microsoft.Diagnostics": "Information",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Diagnostics": "Information",
      "Microsoft.Diagnostics.Monitoring.WebApi.OutputStreamResult": "Warning",
      "Microsoft.Hosting.Lifetime": "Information"
    }
  },
  "Authentication": ":NOT PRESENT:",
  "Egress": {
    "Properties": {
      "MonitorBlobAccountKey": ":REDACTED:"
    },
    "AzureBlobStorage": {
      "MonitorBlob": {
        "AccountUri": "redacted",
        "BlobPrefix": "dotnetmetrics",
        "ContainerName": "storeapi",
        "CopyBufferSize": ":NOT PRESENT:",
        "QueueName": ":NOT PRESENT:",
        "QueueAccountUri": ":NOT PRESENT:",
        "SharedAccessSignature": ":NOT PRESENT:",
        "AccountKey": ":NOT PRESENT:",
        "SharedAccessSignatureName": ":NOT PRESENT:",
        "AccountKeyName": "MonitorBlobAccountKey",
        "ManagedIdentityClientId": ":NOT PRESENT:"
      }
    },
    "FileSystem": ":NOT PRESENT:"
  }
jander-msft commented 1 year ago

Couple of thoughts here:

johan-bjerling commented 1 year ago

Couple of thoughts here:

My app was not trimmed. It was however using OpenTelemetry. But as we couldn't get the dotnet-monitor sidecar to work quite how we wanted it, that's now been removed. So I don't have an easy way of testing this again without OpenTelemetry.

I suspect I won't have the time in the near future to try this out, but I will report back if I do! Thanks for the ideas though @jander-msft, hopefully they'll be of help to others!

Krishnadas-KP commented 1 year ago

I have the same problem. Unable to get system.runtime metrics in the /metrics endpoint. This is my config

{
  "urls": "http://localhost:52323",
  "Kestrel": ":NOT PRESENT:",
  "Templates": ":NOT PRESENT:",
  "CollectionRuleDefaults": ":NOT PRESENT:",
  "GlobalCounter": {
    "IntervalSeconds": "5"
  },
  "CollectionRules": ":NOT PRESENT:",
  "CorsConfiguration": ":NOT PRESENT:",
  "DiagnosticPort": {
    "ConnectionMode": "Listen",
    "DeleteEndpointOnStartup": "true"
  },
  "InProcessFeatures": ":NOT PRESENT:",
  "Metrics": {
    "Enabled": "True",
    "Endpoints": "http://\u002B:52325",
    "IncludeDefaultProviders": "True",
    "MetricCount": "3"
  },
  "Storage": {
    "DefaultSharedPath": "/diag"
  },
  "DefaultProcess": {
    "Filters": [
      {
        "Key": "ProcessId",
        "Value": "1"
      }
    ]
  },
  "Logging": {
    "Console": {
      "FormatterName": "json",
      "FormatterOptions": {
        "IncludeScopes": "True",
        "TimestampFormat": "yyyy-MM-ddTHH:mm:ss.fffffffZ",
        "UseUtcTimestamp": "true"
      }
    },
    "EventLog": {
      "LogLevel": {
        "Default": "Information",
        "Microsoft": "Warning",
        "Microsoft.Diagnostics": "Information",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Diagnostics": "Information",
      "Microsoft.Hosting.Lifetime": "Information"
    }
  },
  "Authentication": ":NOT PRESENT:",
  "Egress": ":NOT PRESENT:"

I am using version 7.0.2 as a sidecar in a pod in AKS.

sheng-jie commented 1 year ago

I have the same issue with version 7.2. The deployment is :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: akstest
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: akstest
  template:
      labels:
        app: akstest
    spec:
      volumes:
        - name: diagvol
          emptyDir: {}
        - name: dumpsvol
          emptyDir: {}
      containers:
        - name: client-api
          image: mcr.microsoft.com/dotnet/samples:aspnetapp
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          env:
            - name: DOTNET_DiagnosticPorts
              value: /diag/port.sock
          resources:
            limits:
              cpu: 250m
              memory: 512Mi
          volumeMounts:
            - name: diagvol
              mountPath: /diag
            - name: dumpsvol
              mountPath: /dumps
        - name: monitor
          image: mcr.microsoft.com/dotnet/monitor:7.2
          args:
            - collect
            - '--urls'
            - http://+:52323
            - '--no-auth'
          ports:
            - containerPort: 52323
              protocol: TCP
          env:
            - name: DOTNETMONITOR_DiagnosticPort__ConnectionMode
              value: Listen
            - name: DOTNETMONITOR_DiagnosticPort__EndpointName
              value: /diag/port.sock
            - name: DOTNETMONITOR_Storage__DumpTempFolder
              value: /dumps
            - name: DOTNETMONITOR_Urls
              value: http://+:52323
          resources:
            limits:
              cpu: 250m
              memory: 256Mi
            requests:
              cpu: 50m
              memory: 32Mi
          volumeMounts:
            - name: diagvol
              mountPath: /diag
            - name: dumpsvol
              mountPath: /dumps

And the dotnet monitor config is :

{
  "urls": "http://\u002B:52323",
  "Kestrel": ":NOT PRESENT:",
  "Templates": ":NOT PRESENT:",
  "CollectionRuleDefaults": ":NOT PRESENT:",
  "GlobalCounter": {
    "IntervalSeconds": "5"
  },
  "CollectionRules": ":NOT PRESENT:",
  "CorsConfiguration": ":NOT PRESENT:",
  "DiagnosticPort": {
    "ConnectionMode": "Listen",
    "DeleteEndpointOnStartup": "true",
    "EndpointName": "/diag/port.sock"
  },
  "InProcessFeatures": ":NOT PRESENT:",
  "Metrics": {
    "Enabled": "True",
    "Endpoints": "http://localhost:52325",
    "IncludeDefaultProviders": "True",
    "MetricCount": "3"
  },
  "Storage": {
    "DumpTempFolder": "/dumps"
  },
  "DefaultProcess": {
    "Filters": [
      {
        "Key": "ProcessId",
        "Value": "1"
      }
    ]
  },
  "Logging": {
    "Console": {
      "FormatterName": "json",
      "FormatterOptions": {
        "IncludeScopes": "True",
        "TimestampFormat": "yyyy-MM-ddTHH:mm:ss.fffffffZ",
        "UseUtcTimestamp": "true"
      }
    },
    "EventLog": {
      "LogLevel": {
        "Default": "Information",
        "Microsoft": "Warning",
        "Microsoft.Diagnostics": "Information",
        "Microsoft.Hosting.Lifetime": "Information"
      }
    },
    "LogLevel": {
      "Default": "Information",
      "Microsoft": "Warning",
      "Microsoft.Diagnostics": "Information",
      "Microsoft.Hosting.Lifetime": "Information"
    }
  },
  "Authentication": ":NOT PRESENT:",
  "Egress": ":NOT PRESENT:"
}
jander-msft commented 1 year ago

Please check the two questions that I've asked before: https://github.com/dotnet/dotnet-monitor/issues/1241#issuecomment-1351793386

Also, if you use any other tools that collect EventCounters and they are not collecting at the same interval as .NET Monitor (default is 5 seconds), then that will cause .NET Monitor to fail to collect them.

kchilka-msft commented 1 year ago

hey @jander-msft - we are using OpenTelemetry and I am not seeing metrics in Prometheus.

Although, when I curl http://localhost:52325/metrics I do see metrics getting generated for microsoftaspnetcorehosting, systemruntime and microsoftaspnetcoreserverkestrel but I don't see them in Prometheus.

I am assuming it has to do with interval mismatch with OpenTelemetry and .Net Monitor? Is the fix to make sure the interval values match for .Net Monitor and OpenTelemetry? Is there any other change that needs to be made?

Thanks in advance!

jander-msft commented 1 year ago

Although, when I curl http://localhost:52325/metrics I do see metrics getting generated for microsoftaspnetcorehosting, systemruntime and microsoftaspnetcoreserverkestrel but I don't see them in Prometheus.

If you see systemruntime from .NET Monitor, then it is unlikely that OpenTelemetry is interfering with the collection of metrics by .NET Monitor.

I am assuming it has to do with interval mismatch with OpenTelemetry and .Net Monitor? Is the fix to make sure the interval values match for .Net Monitor and OpenTelemetry? Is there any other change that needs to be made?

.NET Monitor doesn't automatically provide information to a Promethues server; you have to configure that server to scrape the /metrics route.

I'm not certain that the interval mismatch is impacting anything within the consumption of the Prometheus metrics. @wiktork any ideas?