GoogleCloudPlatform / ops-agent

Apache License 2.0
141 stars 68 forks source link

JMX collected on one VM but not on another #738

Open black-snow opened 2 years ago

black-snow commented 2 years ago

Describe the bug I configured my app and ops-agent to collect JVM metrics and it worked just fine on the first VM. The exact same configuration won't work on another VM, however. The only difference is that the first runs Java 8 (openjdk version "1.8.0_292"), which is not supported according to the docs and the second runs Java 11 (openjdk version "11.0.11" 2021-04-20), which is supported (...)

grafik

Both run Debian 10.
Both output virtually the same merged config:

2022/07/18 17:23:52 Merged config
logging:
  ...
metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 60s
    jvm:
      type: jvm
      collection_interval: ""
      endpoint: localhost:9999
      username: ""
      password: ""
      additional_jars: []
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern: []
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics]
        processors: [metrics_filter]
      jvm:
        receivers: [jvm]
        processors: []

Both have the same JVM flags as well:

-Dcom.sun.management.jmxremote.port=9999 
-Dcom.sun.management.jmxremote.rmi.port=9999 
-Djava.rmi.server.hostname=127.0.0.1 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false

Plus I can tunnel to the second VM and attach my own jconsole to it and it works just fine. I restarted the ops-agent a couple times already.

To Reproduce Steps to reproduce the behavior:

I don't know.

Expected behavior Monitoring picks up JMX metrics.

Environment (please complete the following information):

Additional context Is there any way for me to investigate further?

black-snow commented 2 years ago

I have another VM running Java 17 Temurin. Same IAM permissions (Logs & Metric writer), same opsagent config, same JMX args, no metrics get written:

Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]: metrics:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:   receivers:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:     hostmetrics:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       type: hostmetrics
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       collection_interval: 60s
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:     jvm:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       type: jvm
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       collection_interval: ""
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       endpoint: localhost:9999
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       username: ""
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       password: ""
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       additional_jars: []
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:   processors:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:     metrics_filter:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       type: exclude_metrics
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       metrics_pattern: []
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:   service:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:     pipelines:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       default_pipeline:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:         receivers: [hostmetrics]
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:         processors: [metrics_filter]
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:       jvm:
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:         receivers: [jvm]
Sep 19 18:23:10 prod-fuv-service-vm google_cloud_ops_agent_engine[22697]:         processors: []
black-snow commented 2 years ago

Went through https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/troubleshooting but to no avail. The JVM versions remains as the single thing that differs.