elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
17 stars 144 forks source link

agent.logging.metrics options for stand-alone agent not passed through to beats components #3011

Open alstolten opened 1 year ago

alstolten commented 1 year ago

Version: 8.7.1

Operating System: Linux/Windows

Description: Using the agent.logging.metrics.enabled or agent.logging.metrics.period for a stand-alone Elastic-Agent, as described in the reference yaml does not work. The options will not be forwarded to the underlying beats components. Ie they will be started without the necessary -E logging.metrics.enabled=false command line argument.

root       70333  0.0  0.3 1736536 102576 ?      Sl   12:21   0:00 /opt/Elastic/Agent/data/elastic-agent-10dc6a/components/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E management.restart_on_output_change=true -E logging.level=info -E logging.to_stderr=true -E gc_percent=${FILEBEAT_GOGC:100} -E filebeat.config.modules.enabled=false -E http.enabled=true -E http.host=unix:///opt/Elastic/Agent/data/tmp/log-default.sock -E path.data=/opt/Elastic/Agent/data/elastic-agent-10dc6a/run/log-default

Working example from 7.17.11:

root       92035  0.2  0.3 1639512 103628 ?      Sl   12:51   0:00 /opt/Elastic/Agent/data/elastic-agent-17416f/install/filebeat-7.17.11-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOGC:100} -E logging.level=info -E http.enabled=true -E http.host=unix:///opt/Elastic/Agent/data/tmp/default/filebeat/filebeat.sock -E logging.json=true -E logging.ecs=true -E logging.files.path=/opt/Elastic/Agent/data/elastic-agent-17416f/logs/default -E logging.files.name=filebeat-json.log -E logging.files.keepfiles=7 -E logging.files.permission=0640 -E logging.files.interval=1h -E logging.metrics.enabled=false -E path.data=/opt/Elastic/Agent/data/elastic-agent-17416f/run/default/filebeat--7.17.11

Steps to Reproduce:

  1. Download and extract elastic-agent-8.7.1-linux-x86_64
  2. Create simple custom log policy and download the .yml
  3. Add agent.logging.metrics.enabled: false to the elastic-agent.yml and install/run the agent
  4. Still observe the following messages in the logs:
    {
    "log.level": "info",
    "@timestamp": "2023-07-06T13:02:37.828Z",
    "message": "Non-zero metrics in the last 30s",
    "component": {
        "binary": "filebeat",
        "dataset": "elastic_agent.filebeat",
        "id": "log-default",
        "type": "log"
    },
    "log": {
        "source": "log-default"
    },
    "service.name": "filebeat",
    "monitoring": {
        "ecs.version": "1.6.0",
        "metrics": {
            "beat": {
                "cgroup": {
                    "cpu": {
                        "id": "elastic-agent.service"
                    },
                    "memory": {
                        "id": "elastic-agent.service",
                        "mem": {
                            "usage": {
                                "bytes": 51675136
                            }
                        }
                    }
                },
                "cpu": {
                    "system": {
                        "ticks": 30,
                        "time": {
                            "ms": 30
                        }
                    },
                    "total": {
                        "ticks": 160,
                        "time": {
                            "ms": 160
                        },
                        "value": 160
                    },
                    "user": {
                        "ticks": 130,
                        "time": {
                            "ms": 130
                        }
                    }
                },
                "handles": {
                    "limit": {
                        "hard": 524288,
                        "soft": 524288
                    },
                    "open": 14
                },
                "info": {
                    "ephemeral_id": "91cd1cc2-23d6-4878-802d-855b4a1ee800",
                    "name": "filebeat",
                    "uptime": {
                        "ms": 33075
                    },
                    "version": "8.7.1"
                },
                "memstats": {
                    "gc_next": 20059216,
                    "memory_alloc": 14882872,
                    "memory_sys": 34452744,
                    "memory_total": 59764912,
                    "rss": 111521792
                },
                "runtime": {
                    "goroutines": 53
                }
            },
            "filebeat": {
                "events": {
                    "active": 0,
                    "added": 76,
                    "done": 76
                },
                "harvester": {
                    "open_files": 1,
                    "running": 1,
                    "started": 1
                }
            },
            "libbeat": {
                "config": {
                    "module": {
                        "running": 1,
                        "starts": 1
                    }
                },
                "output": {
                    "events": {
                        "acked": 75,
                        "active": 0,
                        "batches": 1,
                        "total": 75
                    },
                    "read": {
                        "bytes": 3588
                    },
                    "type": "elasticsearch",
                    "write": {
                        "bytes": 99154
                    }
                },
                "pipeline": {
                    "clients": 1,
                    "events": {
                        "active": 0,
                        "filtered": 1,
                        "published": 75,
                        "retry": 75,
                        "total": 76
                    },
                    "queue": {
                        "acked": 75,
                        "max_events": 4096
                    }
                }
            },
            "registrar": {
                "states": {
                    "current": 1,
                    "update": 76
                },
                "writes": {
                    "success": 2,
                    "total": 2
                }
            },
            "system": {
                "cpu": {
                    "cores": 8
                },
                "load": {
                    "1": 0.67,
                    "15": 0.31,
                    "5": 0.44,
                    "norm": {
                        "1": 0.0838,
                        "15": 0.0388,
                        "5": 0.055
                    }
                }
            }
        }
    },
    "log.logger": "monitoring",
    "log.origin": {
        "file.line": 187,
        "file.name": "log/log.go"
    },
    "ecs.version": "1.6.0"
    }
elasticmachine commented 1 year ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

cmacknz commented 1 year ago

This likely stopped working in the 8.6 release. What is the use case for turning these off?

These metrics can be the only way for us diagnose performance issues, so having them always be present in the logs helps root cause analysis significantly.

cmacknz commented 1 year ago

There is still a path in the code that leads to the flag here being disabled:

https://github.com/elastic/elastic-agent/blob/ca545e2b1f26a24369a5e6262b8afc8378b108d5/internal/pkg/agent/application/monitoring/v1_monitor.go#L193-L197

It looks like we set the YAML configuration to "-" in the struct tags which means it isn't actually configurable:

https://github.com/elastic/elastic-agent/blob/ca545e2b1f26a24369a5e6262b8afc8378b108d5/internal/pkg/core/monitoring/config/config.go#L17

This was originally added in https://github.com/elastic/elastic-agent/commit/393b2f018d3e347a750b3dceab55867967ecf338, so it looks like we accidentally disabled the path in the code that allows turning this off based on configuration.

alstolten commented 1 year ago

Hey @cmacknz the user wants to disable those because there are many messages from many agents reaching their cluster. They reported to have used the options in the documentation and were wondering why this does not work.