elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
123 stars 132 forks source link

[Elastic Agent][Standalone] Add configuration validation command #77

Open BenB196 opened 2 years ago

BenB196 commented 2 years ago

Describe the enhancement:

Currently, there is no good way of validating a configuration for a standalone Elastic Agent.

The only way to do this currently, is to load up an agent with a configuration and wait to see if it (or its subprocesses (beats)) run into any issues.

(Running the help command against the Elastic Agent binary shows no "configuration"/"validation" command)

elastic-agent help  
Usage:
  elastic-agent [subcommand] [flags]
  elastic-agent [command]

Available Commands:
  diagnostics Gather diagnostics information from the elastic-agent and running processes.
  enroll      Enroll the Agent into Fleet
  help        Help about any command
  inspect     Shows configuration of the agent
  install     Install Elastic Agent permanently on this system
  restart     Restart the currently running Elastic Agent daemon
  run         Start the elastic-agent.
  status      Status returns the current status of the running Elastic Agent daemon.
  uninstall   Uninstall permanent Elastic Agent from this system
  upgrade     Upgrade the currently running Elastic Agent to the specified version
  version     Display the version of the elastic-agent.
  watch       Watch watches Elastic Agent for failures and initiates rollback.

Flags:
  -c, --c string                     Configuration file, relative to path.config (default "elastic-agent.yml")
  -d, --d string                     Enable certain debug selectors
  -e, --e                            Log to stderr and disable syslog/file output
      --environment environmentVar   set environment being ran in (default default)
  -h, --help                         help for elastic-agent
      --path.config string           Config path is the directory Agent looks for its config file (default "/usr/share/elastic-agent")
      --path.downloads string        Downloads path contains binaries Agent downloads
      --path.home string             Agent root path (default "/usr/share/elastic-agent")
      --path.install string          Install path contains binaries Agent extracts
      --path.logs string             Logs path contains Agent log output (default "/usr/share/elastic-agent")
  -v, --v                            Log at INFO level

Use "elastic-agent [command] --help" for more information about a command.

Describe a specific use case for the enhancement or feature:

Generalized Use Case 1

If I write a standalone configuration, I would like to know that the configuration I have written is valid, without having to actually run an Elastic Agent process.

Generalized Use Case 2

If I am upgrading a standalone Elastic Agent, I would like to be able to test my configuration against the newer version before I perform the upgrade to ensure there are no breaking changes (or even warnings/deprecations).

Specific Use Case

I am trying to write some automation around CI/CD for users to build abstracted Elastic Agent configurations then have Terraform take these abstracted configurations and turn them into real Elastic Agent standalone configurations and deploy them.

The issue I'm running into, is that there isn't a good way to actually validate that a configuration generated is valid in a test phase.

I would like to simply just have a step which runs a command like:

elastic-agent validate --path.config <path_to_config_needing_validation>

And use the response code to determine the needed action:

Exit 0 - success Exit 1 - error Exit ? - Warning/deprecated

elasticmachine commented 2 years ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

nimarezainia commented 2 years ago

@BenB196 are you writting the configuration from a blank slate? or gathering it from the fleet UI? the validated version of the config is in the UI

BenB196 commented 2 years ago

Hi @nimarezainia that configuration is effectively a blank slate. It is built via a Terraform template that looks something like:

Snippet of one of the templates (part of a larger Terraform deployment flow):

inputs:
%{ for INPUT_KEY, INPUT_VALUE in INPUTS }
  - id: ${INPUT_VALUE.ID}
    name: ${INPUT_VALUE.NAME}
    revision: 1
    type: synthetics/tcp
    use_output: ${ try(INPUT_VALUE.OUTPUT, "default") }
    meta:
      package:
        name: synthetics
        version: ${SYNTHETICS_VERSION}
    data_stream:
      namespace: ${INPUT_VALUE.DATA_STREAM.NAMESPACE}
    streams:
%{ for STREAM_KEY, STREAM_VALUE in INPUT_VALUE.STREAMS }
      - id: ${STREAM_VALUE.ID}
        name: ${STREAM_VALUE.NAME}
        type: tcp
        data_stream:
          dataset: tcp
          type: synthetics
        schedule: ${ yamlencode(try(STREAM_VALUE.SCHEDULE, "@every 20s")) }
%{ if try(STREAM_VALUE.SSL.DISABLED != true, true) }
        ssl.verification_mode: ${ try(STREAM_VALUE.SSL.VERIFICATION_MODE, "full") }
        ssl.supported_protocols: ${ try(STREAM_VALUE.SSL.SUPPORTED_PROTOCOLS, "['TLSv1.2','TLSv1.3']") }
%{ endif }
        hosts:
%{ for HOST in STREAM_VALUE.HOSTS}
          - ${HOST}
%{ endfor ~}
        processors:
          - add_observer_metadata:
              geo:
                city_name: ${ try(OBSERVER.GEO.CITY_NAME, "null") }
                continent_code: ${ try(OBSERVER.GEO.CONTIENT_CODE, "null") }
                continent_name: ${ try(OBSERVER.GEO.CONTINENT_NAME, "null") }
                country_iso_code: ${ try(OBSERVER.GEO.COUNTRY_ISO_CODE, "null") }
                country_name: ${ try(OBSERVER.GEO.COUNTRY_NAME, "null") }
                location: ${ try(OBSERVER.GEO.LOCATION, "null") }
                name: ${ try(OBSERVER.GEO.NAME, "null") }
                postal_code: ${ try(OBSERVER.GEO.POSTAL_CODE, "null") }
                region_iso_code: ${ try(OBSERVER.GEO.REGION_ISO_CODE, "null") }
                region_name: ${ try(OBSERVER.GEO.REGION_NAME, "null") }
                timezone: ${ try(OBSERVER.GEO.TIMEZONE, "null") }
%{ if can(STREAM_VALUE.SERVICE) || can(STREAM_VALUE.SERVICE) }
          - add_fields:
              target: service
              fields:
                environment: ${ try(STREAM_VALUE.SERVICE.ENVIRONMENT, "null") }
                type: ${ try(STREAM_VALUE.SERVICE.TYPE, "null") }
%{ endif }
%{ if can(STREAM_VALUE.HOST.TYPE) || can(STREAM_VALUE.HOST.ARCHITECTURE) }
          - add_fields:
              target: host
              fields:
                type: ${ try(STREAM_VALUE.HOST.TYPE, "null") }
                architecture: ${ try(STREAM_VALUE.HOST.ARCHITECTURE, "null") }
%{ endif }
          - add_fields:
              target: host.geo
              fields:
                city_name: ${ try(STREAM_VALUE.HOST.GEO.CITY_NAME, "null") }
                continent_code: ${ try(STREAM_VALUE.HOST.GEO.CONTIENT_CODE, "null") }
                continent_name: ${ try(STREAM_VALUE.HOST.GEO.CONTINENT_NAME, "null") }
                country_iso_code: ${ try(STREAM_VALUE.HOST.GEO.COUNTRY_ISO_CODE, "null") }
                country_name: ${ try(STREAM_VALUE.HOST.GEO.COUNTRY_NAME, "null") }
                location: ${ try(STREAM_VALUE.HOST.GEO.LOCATION, "null") }
                name: ${ try(STREAM_VALUE.HOST.GEO.NAME, "null") }
                postal_code: ${ try(STREAM_VALUE.HOST.GEO.POSTAL_CODE, "null") }
                region_iso_code: ${ try(STREAM_VALUE.HOST.GEO.REGION_ISO_CODE, "null") }
                region_name: ${ try(STREAM_VALUE.HOST.GEO.REGION_NAME, "null") }
                timezone: ${ try(STREAM_VALUE.HOST.GEO.TIMEZONE, "null") }
%{ if can(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS) }
          - dns:
              action: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.ACTION, "append") }
              failure_cache:
                capacity.initial: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.FAILURE_CACHE.CAPACITY.INITIAL, "null") }
                capacity.max: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.FAILURE_CACHE.CAPACITY.MAX, "null") }
              fields:
%{ for FIELD in STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.FIELDS }
                ${FIELD}
%{ endfor ~}
              nameservers:
%{ for NAMESERVER in STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.NAMESERVERS }
                - ${NAMESERVER}
%{ endfor ~}
              success_cache:
                capacity.initial: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.SUCCESS_CACHE.CAPACITY.INITIAL, "null") }
                capacity.max: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.SUCCESS_CACHE.CAPACITY.MAX, "null") }
              tag_on_failure:
%{ for FAILURE_TAG in STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.TAG_ON_FAILURE }
                - ${FAILURE_TAG}
%{ endfor ~}
              timeout: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.TIMEOUT, "null") }
              transport: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.TRANSPORT, "udp") }
              type: ${ try(STREAM_VALUE.ADDITIONAL_PROCESSORS.DNS.TRANSPORT, "reverse") }
%{ endif }
          - copy_fields:
              when:
                has_fields:
                  - 'resolve.ip'
              fields:
                - from: url.domain
                  to: host.hostname
                - from: resolve.ip
                  to: host.ip
              fail_on_error: false
              ignore_missing: true
          - copy_fields:
              when:
                and:
                  - not:
                      has_fields:
                        - 'resolve.ip'
                  - network:
                      url.domain: [loopback, unicast, multicast, interface_local_multicast, link_local_unicast, link_local_multicast, private, public, unspecified]
              fields:
                - from: url.domain
                  to: host.ip
              fail_on_error: false
              ignore_missing: true
          - dissect:
              tokenizer: "%%{host.name}.%%{host.domain}"
              field: "host.hostname"
              target_prefix: ''
              overwrite_keys: true
              trim_values: 'all'
        timeout: ${ try(STREAM_VALUE.TIMEOUT, "5s") }
        tags:
%{ for TAG in STREAM_VALUE.TAGS}
          - ${TAG}
%{ endfor ~}
%{ endfor ~}
%{ endfor ~}

As you can see, the template while somewhat structured, can potentially introduce some issues depending on how people pass through different variables to the template.

But: To build the base "template", I did create an integration in the Kibana UI, as you can see however, it no longer looks very similar to something that Kibana would generate.