elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
124 stars 133 forks source link

Event log file gets flooded with `copy_fields` processor error #5299

Open belimawr opened 1 month ago

belimawr commented 1 month ago

For confirmed bugs, please report:

The filestream-monitoring deployed by Elastic-Agent to collect its own logs is flooding the event logs with an copy_fields processor error:

{
  "log.level": "debug",
  "@timestamp": "2024-08-14T13:40:27.920Z",
  "message": "Failed to copy fields in copy_fields processor: target field data_stream.dataset already exists, drop or rename this field first",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-monitoring",
    "type": "filestream"
  },
  "log": {
    "source": "filestream-monitoring"
  },
  "log.type": "event",
  "ecs.version": "1.6.0",
  "log.logger": "copy_fields",
  "log.origin": {
    "file.line": 81,
    "file.name": "actions/copy_fields.go",
    "function": "github.com/elastic/beats/v7/libbeat/processors/actions.(*copyFields).Run"
  },
  "service.name": "filebeat"
}

On my test, that accounted for about 38% of the entries in the event log.

This seems to be coming from the following chain of processors (from components/filestream-monitoring/beat-rendered-config.yml):

        - copy_fields:
            fields:
                - from: data_stream.dataset
                  to: data_stream.dataset_original
        - drop_fields:
            fields:
                - data_stream.dataset
        - copy_fields:
            fail_on_error: false
            fields:
                - from: component.dataset
                  to: data_stream.dataset
            ignore_missing: true
        - copy_fields:
            fail_on_error: false
            fields:
                - from: data_stream.dataset_original
                  to: data_stream.dataset

The third and forth processors copy different fields to the same destination, so if the third runs successfully, then the forth will always fail and generate the log message above.

While this can be the intended behaviour (trying to set data_stream.dataset from multiple sources), it is flooding our logs.

We can use some processor conditions to avoid running the copy_fields processor if the field is already present, thus avoiding the flood of debug logs.

Steps to reproduce

elasticmachine commented 1 month ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

belimawr commented 1 month ago

Another option is to move the log entry inside the fail_on_error check, so we only log in case the processor actually failed.

Here is where we're logging it: https://github.com/elastic/beats/blob/f6b8701e8c8034836becb9ccaf3f4b2449fc589f/libbeat/processors/actions/copy_fields.go#L81C4-L87

blakerouse commented 1 month ago

@belimawr I believe the change in beats is a better change, seems weird that a fail_on_error: true would flood the logs.

belimawr commented 1 month ago

@belimawr I believe the change in beats is a better change, seems weird that a fail_on_error: true would flood the logs.

Did you mean fail_on_error: false? I'd expect the processor to be verbose if it fails so users can easily know something is not working as expected.

Even with fail_on_error: false it fails on every event, hence, it floods the logs. Regardless of changing when/how it logs, I believe the Elastic-Agent should not be relying on processors silently failing when trying to populate some fields, we have mechanisms to avoid this, and I believe they should be used.