GoogleCloudPlatform / ops-agent

Apache License 2.0
141 stars 68 forks source link

Reconfiguring built-in Prometheus listeners (TCP ports 20201, 20202) #898

Open jeremyvisser opened 2 years ago

jeremyvisser commented 2 years ago

Describe the bug

Prometheus exporter TCP ports (20201, 20202) are enabled by default on Ops Agent, which causes problems for users wanting to bind to those ports for other purposes, or reduce network exposure.

While the Prometheus listeners are fairly minimal (a fairly simple handler for /metrics), since the daemons run as root, users wanting to run Ops Agent in a security–sensitive environment will want to eliminate inbound requests.

Additionally, users wanting to run their own service binding to TCP ports 20201 or 20202 will run into conflicts.

There's no obvious way to reconfigure these ports, whether changing the binding address or port numbers. At the very least, it should be possible to bind these ports to localhost instead (::1 or 127.0.0.1).

I recognise that these ports are used by Ops Agent for self monitoring, so avoiding listening on the ports entirely is likely infeasible.

To Reproduce Steps to reproduce the behavior:

  1. Environment: RHEL 7, Ops Agent google-cloud-ops-agent-2.22.0-1.el7.x86_64
  2. Use default config
  3. Run netstat -anp | grep :2020:
    tcp6       0      0 :::20201                :::*                    LISTEN      15072/otelopscol
    tcp        0      0 0.0.0.0:20202           0.0.0.0:*               LISTEN      15123/fluent-bit
    tcp        0      0 127.0.0.1:20202         127.0.0.1:45340         ESTABLISHED 15123/fluent-bit
    tcp        0      0 127.0.0.1:45340         127.0.0.1:20202         ESTABLISHED 15072/otelopscol
    tcp        0      0 127.0.0.1:51388         127.0.0.1:20201         ESTABLISHED 15072/otelopscol
    tcp6       0      0 127.0.0.1:20201         127.0.0.1:51388         ESTABLISHED 15072/otelopscol

    Observe that ports 20201 and 20202 are bound to the zero address.

  4. Observe this config in /run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml:
     telemetry:
       metrics:
         address: 0.0.0.0:20201
  5. Observe this config in /run/google-cloud-ops-agent-fluent-bit/fluent_bit_main.conf:
    [OUTPUT]
       Match *
       Name  prometheus_exporter
       host  0.0.0.0
       port  20202

Expected behavior

I would expect to be able to reconfigure the bind address to force the ports to bind to localhost only (::1 and 127.0.0.1), as well as change the port numbers.

While iptables rules are additionally useful as a defense-in-depth method, avoiding binding unnecessarily in the first place may be preferable.

Environment (please complete the following information):

Additional context https://issuetracker.google.com/251023934

aryehb commented 3 months ago

I'm having the same issue.

@jeremyvisser When I try that link to the Google Issuer Tracker, I get Access Denied. Was there any information there beyond what you wrote here?

Did you try changing the host in those config files to 127.0.0.1?

jeremyvisser commented 3 months ago

The link is just context about how the issue affected internal usage of the tool, but otherwise uninteresting.

I didn’t try altering those /run/google-cloud-ops-agent-*/* files, as they’re dynamically generated every time Ops Agent launches. Even if it “worked”, it wouldn’t be the correct fix.