elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
186 stars 390 forks source link

System integration >= 1.47.0 breaks Elastic Agent 7.17.x #9460

Open rdner opened 3 months ago

rdner commented 3 months ago

Versions of the system integration >= 1.47.0 are incompatible with Elastic Agent 7.17.x but are allowed to be installed.

Steps to reproduce:

Start a 8.13.0 stack:

elastic-package stack up -d -v --version "8.13.0"

Go Management->Fleet, create a policy with a system integration, click Add agent

Install and enroll a 7.17.19 agent:

sudo ./elastic-agent install --insecure --url=https://fleet-server:8220 --enrollment-token=<enrollment token>

Run the status command after a few minutes and you'll see that Filebeat got stuck in Updating configuration

Looking at the logs (Linux path) you'll see:

sudo cat /opt/Elastic/Agent/data/elastic-agent-*/logs/default/filebeat-json.log* | grep error | jq .
{
  "log.level": "error",
  "@timestamp": "2024-03-27T16:31:00.737Z",
  "log.logger": "centralmgmt",
  "log.origin": {
    "file.name": "cfgfile/list.go",
    "file.line": 108
  },
  "message": "Error creating runner from config: the processor action syslog does not exist. Valid actions: add_fields, drop_fields, urldecode, add_kubernetes_metadata, drop_event, rename, add_cloud_metadata, add_docker_metadata, dns, extract_array, fingerprint, decode_base64_field, add_cloudfoundry_metadata, registered_domain, decode_cef, decompress_gzip_field, decode_json_fields, convert, copy_fields, include_fields, replace, add_process_metadata, community_id, decode_csv_fields, timestamp, add_nomad_metadata, add_labels, add_network_direction, add_tags, truncate_fields, add_host_metadata, add_observer_metadata, dissect, rate_limit, add_locale, script, detect_mime_type, decode_xml, decode_xml_wineventlog, add_id",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}

which is coming from this config block:

sudo cat /opt/Elastic/Agent/data/elastic-agent-*/state.yml | grep -B 30 -A 5 'syslog:'
      id: logfile-system-63215a40-e114-470d-82cd-585cceac27a4
      meta:
        package:
          name: system
          version: 1.54.0
      name: system-1
      package_policy_id: 63215a40-e114-470d-82cd-585cceac27a4
      revision: 1
      streams:
      - data_stream:
          dataset: system.auth
          type: logs
        exclude_files:
        - \.gz$
        id: logfile-system.auth-63215a40-e114-470d-82cd-585cceac27a4
        ignore_older: 72h
        multiline:
          match: after
          pattern: ^\s
        paths:
        - /var/log/auth.log*
        - /var/log/secure*
        processors:
        - add_locale: null
        - rename:
            fail_on_error: false
            fields:
            - from: message
              to: event.original
            ignore_missing: true
        - syslog:
            field: event.original
            ignore_failure: true
            ignore_missing: true
        tags:
        - system-auth

This change is coming from this PR https://github.com/elastic/integrations/pull/8103

I suppose the main problem here is that the system integration has no condition for the agent versions:

Screenshot 2024-03-27 at 17 57 23

and nothing prevents incompatible versions of the system integration from being installed on 7.17.x

elasticmachine commented 3 months ago

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

cmacknz commented 3 months ago

The relevant PR introducing v1.47.0 https://github.com/elastic/integrations/pull/8103

rdner commented 3 months ago

The breaking functionality is enabled by default for any system integration. It's enough to add the >=1.47.0 system integration with default settings to break a 7.17.x agent.

Screenshot 2024-03-27 at 19 46 32
elasticmachine commented 3 weeks ago

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)