DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.74k stars 1.18k forks source link

jmxfetch is using public IP for jmx connection despite config pointing to 127.0.0.1:7199 #9093

Open cosimo opened 2 years ago

cosimo commented 2 years ago

Output of the info page (if this is a bug)

Agent (v6.23.1)
===============

  Status date: 2021-09-07 10:04:58.980651 UTC
  Agent start: 2021-09-07 09:40:53.588442 UTC
  Pid: 14186
  Go Version: go1.14.7
  Python Version: 2.7.18
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 5
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: 2.151ms
    System UTC time: 2021-09-07 10:04:58.980651 UTC

  Host Info
  =========
    bootTime: 2021-06-03 03:15:43.000000 UTC
    kernelArch: x86_64
    kernelVersion: 4.15.0-122-generic
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 18.04
    procs: 275
    uptime: 2310h25m17s
    virtualizationRole: host
    virtualizationSystem: kvm

  Hostnames
  =========
    hostname: ca5-3
    socket-fqdn: ca5-3.domain.net
    socket-hostname: ca5-3
    host tags:
      role:<omitted>,
      provider:<omitted>,
      cluster:ca5,
      env:production
    hostname provider: os
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

  Metadata
  ========
    hostname_source: os

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 7, Total: 666
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-09-07 10:04:49.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:49.000000 UTC

    disk (3.0.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 96, Total: 9,216
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 55ms
      Last Execution Date : 2021-09-07 10:04:56.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:56.000000 UTC

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 5, Total: 480
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-09-07 10:04:48.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:48.000000 UTC

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 130, Total: 12,390
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-09-07 10:04:55.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:55.000000 UTC

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 6, Total: 576
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-09-07 10:04:47.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:47.000000 UTC

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 18, Total: 1,728
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-09-07 10:04:54.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:54.000000 UTC

    network (1.18.1)
    ----------------
      Instance ID: network:5c571333f400457d [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 31, Total: 2,976
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 3ms
      Last Execution Date : 2021-09-07 10:04:46.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:46.000000 UTC

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 1, Total: 2
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 2
      Average Execution Time : 927ms
      Last Execution Date : 2021-09-07 09:55:58.000000 UTC
      Last Successful Execution Date : 2021-09-07 09:55:58.000000 UTC

    openmetrics (1.10.0)
    --------------------
      Instance ID: openmetrics:fluentd:ccf7c5ddb6c8a6c0 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/openmetrics.d/conf.yaml
      Total Runs: 97
      Metric Samples: Last Run: 14, Total: 1,358
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 97
      Average Execution Time : 11ms
      Last Execution Date : 2021-09-07 10:04:57.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:57.000000 UTC

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 96
      Metric Samples: Last Run: 1, Total: 96
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-09-07 10:04:53.000000 UTC
      Last Successful Execution Date : 2021-09-07 10:04:53.000000 UTC

========
JMXFetch
========

  Initialized checks
  ==================
    jmx
      instance_name : jmx-127.0.0.1-7199
      message : Unable to instantiate or initialize instance 127.0.0.1:7199. Is the target JMX Server or JVM running? Connection refused to host: 51.222.47.175; nested exception is: 
        java.net.ConnectException: Connection refused (Connection refused)
      metric_count : 0
      service_check_count : 0
      status : ERROR
      instance_name : jmx-127.0.0.1-7199
      message : Unable to instantiate or initialize instance 127.0.0.1:7199. Is the target JMX Server or JVM running? Connection refused to host: 51.222.47.175; nested exception is: 
        java.net.ConnectException: Connection refused (Connection refused)
      metric_count : 0
      service_check_count : 0
      status : ERROR
  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 96
    Connections: 0
    Containers: 0
    Deployments: 0
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 10
    Metadata: 0
    Nodes: 0
    Pods: 0
    Processes: 0
    RTContainers: 0
    RTProcesses: 0
    ReplicaSets: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    Services: 0
    SketchSeries: 0
    Success: 202
    TimeseriesV1: 96

  API Keys status
  ===============
    API key ending with 50fdd: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 50fdd

==========
Logs Agent
==========

  Logs Agent is not running

=========
APM Agent
=========
  Status: Running
  Pid: 14187
  Uptime: 1445 seconds
  Mem alloc: 24,247,016 bytes
  Hostname: ca5-3
  Receiver: localhost:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 31,222
  Dogstatsd Metric Sample: 10,015
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 96
  Series Flushed: 30,585
  Service Check: 1,158
  Service Checks Flushed: 1,249

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 10,014
  Metric Parse Errors: 0
  Service Check Packets: 192
  Service Check Parse Errors: 0
  Udp Bytes: 791,809
  Udp Packet Reading Errors: 0
  Udp Packets: 1,175
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0

Describe what happened:

jmxfetch fails to connect to the jmx remote connection available on 127.0.0.1:7199. jmxfetch seems to attempt connection to our public IPv4 address instead of the local loopback interface.

...
2021-09-07 09:41:02 UTC | CORE | DEBUG | (pkg/jmxfetch/jmxfetch.go:327 in Start) | Args: [-Xmx200m -Xms50m -classpath /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch.jar org.datadog.jmxfetch.App --ipc_host localhost --ipc_port 33519 --check_period 15000 --thread_pool_size 3 --collection_timeout 60 --reconnection_timeout 60 --reconnection_thread_pool_size 3 --log_level DEBUG --reporter console list_matching_attributes]
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | JMX Fetch 0.39.2 has started
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Found 0 config files
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | DEBUG | HttpClient | attempting to connect to: https://localhost:33519/agent/jmx/configs?timestamp=0
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | DEBUG | HttpClient | with body: 
2021-09-07 09:41:02 UTC | CORE | DEBUG | (cmd/agent/api/agent/agent_jmx.go:43 in getJMXConfigs) | Getting latest JMX Configs as of: 0
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | update is in order - updating timestamp: 1631007662
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | DEBUG | App | received config for check 'jmx_d38ba58458018944'
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Cleaning up instances...
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Dealing with YAML config instances...
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Dealing with Auto-Config instances collected...
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Instantiating instance for: jmx
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | WARN  | Instance | Cannot find a "conf" section in jmx-127.0.0.1-7199
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Started instance initialization...
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | Instance | Trying to connect to JMX Server at 127.0.0.1:7199
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | Instance | Connection closed or does not exist. Attempting to create a new connection...
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | ConnectionFactory | Connecting using JMX Remote
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | Connection | Connecting to: service:jmx:rmi:///jndi/rmi://127.0.0.1:7199/jmxrmi
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Completed instance initialization...
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | Could not initialize instance: jmx-127.0.0.1-7199: java.util.concurrent.ExecutionException: java.rmi.ConnectException: Connection refused to host: 1*.***.***.*** <public ip omitted>; nested exception is: 
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) |       java.net.ConnectException: Connection refused (Connection refused)
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | WARN  | App | Unable to instantiate or initialize instance 127.0.0.1:7199. Is the target JMX Server or JVM running? Connection refused to host: 1*.***.***.*** <public ip omitted>; nested exception is: 
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) |       java.net.ConnectException: Connection refused (Connection refused)
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | ConsoleReporter | jmx.can_connect[jmx_server:127.0.0.1,instance:jmx-127.0.0.1-7199] - 1631007662 = ERROR
2021-09-07 09:41:02 UTC | CORE | INFO | (pkg/jmxfetch/jmxfetch.go:304 in func1) | 2021-09-07 09:41:02 UTC | JMX | INFO  | App | JMXFetch is closing
JMXFetch exited successfully. If nothing was displayed please check your configuration and flags, or re-run the command with a more verbose log level (current log level: 'debug').

The configured jmx url in /etc/datadog-agent/conf.d/jmx.d/conf.yaml is service:jmx:rmi:///jndi/rmi://127.0.0.1:7199/jmxrmi. Despite that, jmxfetch.jar seems to connect to our public IP instead.

Describe what you expected:

I expected jmxfetch to be to connect to the configured url, service:jmx:rmi:///jndi/rmi://127.0.0.1:7199/jmxrmi (= 127.0.0.1:7199). A simple telnet 127.0.0.1 7199 confirms the connection can be established.

Steps to reproduce the issue:

Configure jmx in datadog agent with the following:

init_config:
  is_jmx: true
  collect_default_metrics: true
  new_gc_metrics: true
instances:
  -
    host: 127.0.0.1
    port: 7199
    tools_jar_path: /usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar
    # jmx_url: "service:jmx:rmi:///jndi/rmi://<HOSTNAME>.host:9999/<PATH>"
    name: tomcat

(tried a few variations of this config, none worked, including specifying the jmx url directly, or trying via process name).

Configure the jmx connection with the following java options (in my case this is for tomcat):

...

# Enable JMX remote connections for monitoring via datadog agent
JAVA_OPTS="${JAVA_OPTS} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.host=127.0.0.1 -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

Note that removing -Dcom.sun.management.jmxremote.host=127.0.0.1 from $JAVA_OPTS makes everything work, but we'd rather limit the jmx connections to 127.0.0.1. That's why we are here.

Additional environment details (Operating System, Cloud provider, etc):

None.

RicardoDMAraujo commented 1 year ago

I was successfull solving this same issue by adding -Djava.rmi.server.hostname=127.0.0.1 in my jmx configs to resolve to 127.0.0.1 instead of resolving to the public IP of the instance.