Default configuration in Beat CR

There are few ways a config can be provided in Beats. The most basic one is through inputs. Some of the Beats (Filebeat, Metricbeat, Heartbeat, Auditbeat) also offer an alternative - autodiscover. There are two ways a config can be provided there - via autodiscover.providers[*].hints.default_config and/or autodiscover.providers[*].templates. This means that there are at least three ways users can write their config. This raises a question about our defaults - which way do our default settings use and how they get overriden/replaced by user configs.

I think we should go with autodiscover config as a default, due to:

it has better experience on k8s (you can use hints on pods)
it offers templates that allow to conditionally choose a config to use (for namespace/label x use config y), inputs don’t have such capability
it has kubernetes metadata processor configured correctly already

There is an additional difficulty - autodiscover.providers and autodiscover.providers[*].templates are lists which means that merging them is different (lists items are added together to form a longer list instead of being merged position by position).

I think our goals for good defaults should be to be easily understandable, easy to overwrite and good starting experience.

We have a number of paths we can take:

No default - we provide no default whatsoever and user is responsible for providing the config. We only add output part based on elasticsearchRef.
We provide a default, but it's replaced entirely if user provides any config. We only add output part based on elasticsearchRef. This is the current implementation.
We merge with append the default and user config in the same way we merge all the other configs today.
We implement a "smarter" merging mechanism. This could be one or combination of:
- replace position by position in lists
- for top level items provided by the user, replace them entirely in default config
- allow user to choose the behavior

I think we should cross out:

1 - because it's a bad starting experience
3 - because it's just doing wrong things (our default config can't be overriden, so if someone wants to disable hints, they won't be able to)

So being left with 2 and 4:

2 - easy to understand and implement, but it does mean that even slight change requires user to provide the entire config and it's different behavior than our current configs. I'd be in favor of starting with this.
4 - might be more powerful, but will get tricky to understand and use. Arguably, our default configs won't be providing enough value to justify the work.

To make it easier to make a call, some examples below. As above, ignoring option 3. as won't work for some of the use cases at all.

There are at least two scenarios where this would be used:

overriding default config
overriding config provided via configRef

Get logs only from Pods in `my-app-ns-1` namespace

Option 1. (no default) and 2. (default or users config, no merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    filebeat.autodiscover.providers:
    - node: ${NODE_NAME}
      type: kubernetes
      templates: # this has to be specified as hints.default_config doesn't allow for conditions
      - condition.equals.kubernetes.namespace: my-app-ns-1
        config:
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container
    processors:
    - add_cloud_metadata: null
    - add_host_metadata: null

Option 4. (smart merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    filebeat.autodiscover.providers:
    - hints.enabled: false # user would need to know to disable this, otherwise default config will still be there
      templates: # this has to be specified as hints.default_config doesn't allow for conditions
      - condition.equals.kubernetes.namespace: my-app-ns-1
        config: 
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container

Default to logging disabled and to be enabled by hint in the annotation on the pod

Option 1 (no default) and 2. (default or users config, no merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    filebeat.autodiscover.providers:
    - hints:
        default_config:
          enabled: "false"
          paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log
          type: container
        enabled: "true"
      node: ${NODE_NAME}
      type: kubernetes
processors:
- add_cloud_metadata: null
- add_host_metadata: null

Option 4. (smart merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: filebeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    filebeat.autodiscover.providers:
    - hints.default_config.enabled: "false"

Metricbeat, add custom agent name

Option 1 (no default) and 2. (default or users config, no merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    agent: my-custom-name
    metricbeat:
      autodiscover:
        providers:
        - hints:
            default_config: null
            enabled: "true"
          node: ${NODE_NAME}
          type: kubernetes
      modules:
      - module: system
        period: 10s
        metricsets:
        - cpu
        - load
        - memory
        - network
        - process
        - process_summary
        process:
          include_top_n:
            by_cpu: 5
            by_memory: 5
        processes:
        - .*
      - module: system
        period: 1m
        metricsets:
        - filesystem
        - fsstat
        processors:
        - drop_event:
            when:
              regexp:
                system:
                  filesystem:
                    mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib)($|/)
      - module: kubernetes
        period: 10s
        host: ${NODE_NAME}
        hosts:
        - https://${HOSTNAME}:10250
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        ssl:
          verification_mode: none
        metricsets:
        - node
        - system
        - pod
        - container
        - volume
      - module: kubernetes
        period: 10s
        host: ${NODE_NAME}
        hosts:
        - https://${HOSTNAME}:10249
        metricsets:
        - proxy
    processors:
    - add_cloud_metadata: null

Option 4. (smart merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    agent: my-custom-name

Disable volume and proxy metricsets in default metricbeat config

Option 1 (no default) and 2. (default or users config, no merging):

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    metricbeat:
      autodiscover:
        providers:
        - hints:
            default_config: null
            enabled: "true"
          node: ${NODE_NAME}
          type: kubernetes
      modules:
      - module: system
        period: 10s
        metricsets:
        - cpu
        - load
        - memory
        - network
        - process
        - process_summary
        process:
          include_top_n:
            by_cpu: 5
            by_memory: 5
        processes:
        - .*
      - module: system
        period: 1m
        metricsets:
        - filesystem
        - fsstat
        processors:
        - drop_event:
            when:
              regexp:
                system:
                  filesystem:
                    mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib)($|/)
      - module: kubernetes
        period: 10s
        host: ${NODE_NAME}
        hosts:
        - https://${HOSTNAME}:10250
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        ssl:
          verification_mode: none
        metricsets:
        - node
        - system
        - pod
        - container
      - module: kubernetes
        period: 10s
        host: ${NODE_NAME}
        hosts:
        - https://${HOSTNAME}:10249
        metricsets:
        - proxy
    processors:
    - add_cloud_metadata: null

Option 4. (smart merging):

I don't know if there is a notation that allows to indicate whether we want to merge, replace or remove a key or an item on the list. I'm also not sure if we should invent such notation.

If we merge lists position by position, we could use some special values (null? empty strings? known strings?) to idicate whether default should be preserved or removed, e.g.:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  config:
    metricbeat:
      modules:
      - ~ # meaning: preserve the default on first position
      - ~ # meaning: preserve the default on second position
      - metricsets: 
        - ~ # meaning: preserve the default on this position
        - ~
        - ~
        - apiserver # meaning: replace the item on this position with apiserver
        - "" # meaning: remove metricset on this position
      - "" # meaning: remove module on this position

While this makes the config shorter, I wouldn't say it's more readable nor easy to understand. You need to know what is the default to be able to use this properly.

Other possibility is to allow specifying rfc6902 patches. This way it would become:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: 
spec:
  type: metricbeat
  version: 7.6.2
  elasticsearchRef:
    name: elasticsearch-sample
  configOpts:
    patches:
    - op: remove
      path: /metricbeat/modules/2/metricsets/4
    - op: remove
      path: /metricbeat/modules/3

This still requires knowledge of the original, but seems cleaner and it's already standarized.

Maybe we could use configOpts (or however we call it) to allow users to specify whether they would like to use replace, merge or patch strategy for their config:

  configOpts:
    strategy: replace # use user provided config, don't merge with default config
    #strategy: appendMerge # append user provided config to default config
    #strategy: smartMerge # merge lists position by position
    patches: []

Having said all of the above, I think we should be fine with the approach we have right now (either default or user needs to provide entire config) for the initial release.

Having said all of the above, I think we should be fine with the approach we have right now (either default or user needs to provide entire config) for the initial release

I'm a bit concerned about any future breaking change here (you upgrade ECK and suddenly your config isn't the one you expect anymore). As you said it should not prevent us from moving on with the current approach in the short term.

Also concerned with the lack of consistency in the way we deal with config across all CRDs. Elasticsearch, Kibana, APMServer and Enterprise Search CRDs all default to a config merge (with array items appended, what you call appendMerge I think). Having Beat be the only exception to that rule feels a bit inconsistent. It would be nice to solve the config merging issue for all resources.

Are lists/arrays the only data structure that causes merging issues?

Re. smart merging I'm wondering if we could expose options from https://github.com/elastic/go-ucfg/pull/152 to the user:

 # replace the 'modules' section with our own list of modules
  config:
    metricbeat:
      modules:
      - module: system
        period: 10s
        metricsets:
        - cpu
        - load
  configOpts:
      mergeStrategy:
        metricbeat.modules: replace

It looks like we can go quite far in what can be expressed (eg. metricbeat.modules.1.period or **.period) - not sure if useful for us in practice. We would default to "append" (similar to other CRDs) but allow users to replace some sections of the config entirely.

Edit: thinking about this again I feel like it's very complicated to reason about for the end user. (vs. just editing an existing config file)

My feeling after playing with https://github.com/elastic/cloud-on-k8s/pull/3041:

The 1st thing I tried to do is open the Kibana dashboard to see my Pods logs. Then I realized I need to enable setup.dashboards.enabled: true in the config for that, since it is false by default.

Luckily I knew some implementation details so I grabbed the content of the filebeat-sample-beat-filebeat-config secret, copy-pasted it into the config and added the setup.dashboards.enabled: true setting. It feels a bit unconvenient and hard to understand, I think.

I think that the complexity of Beats configs will prevent us from finding a perfect, clean solution.

Picking a strategy per config path or providing json patches is really flexible, but requires very good knowledge of the default and desired config. This means that the user either went through the code or already viewed the config Secret.
Given how small most of the default configs will most likely be, I think we might be overrating the value of any smart merging. Also, it might be preferable for readability/maintainability reasons to keep the entire config in a single place anyway.
Feedback loop with ECK is really short, so even if the user needs to rebuild our default config, the UX of this process will be pleasant enough. (This was at least my experience when I was exploring some config options.)

I think it will be less confusing and error-prone if we continue with the current approach (either default or user config), with one, maybe two additions:

offer configRef field in the CRD to reference a secret with the entire config (note that this will probably be required as an escape hatch anyway, due to https://github.com/elastic/cloud-on-k8s/issues/3177)
(maybe) introduce a second Secret with a config that doesn't contain output part. This would allow users to copy it, edit and provide in the configRef. This will separate "user editable" part from the entire config and simplify the process a bit.

I'd start with only 1. though.

I wonder if it would make sense to have the ucfg merging options as an escape hatch anyway maybe even across the board in all ?

For the configRef idea, would it make sense to have the operator create and pre-populate the secret with the current (default) config if the user specifies as secret that does not exist? That would give them a starting point for their edits?

We are going to ship 1.2.0 without defaults and without any merging mechanism. The latter is a results of the former: merging brings no benefit if there is only a single source of config - the user. We've decided to avoid providing defaults because:

it's not trivial to pick "the right" defaults as they can vary wildly
often, the most sane defaults in Kubernetes context would need to contain settings that require API access (e.g. autodiscover, Kubernetes module metricsets in Metricbeat), so ECK would need to start managing RBAC (roles, service accounts and bindings) for the sole purpose of Beat defaults. Otherwise, no defaults would work out of the box which defies the purpose
providing defaults gives an implicit promise to support them in all currently supported Beat versions and in any future versions which seems like a substantial amount of work
merging defaults with user config in a simple, predictable way is arguably not possible (see the discussion above) and difficult to provide in a backwards compatible way
given the merging issue above, it's difficult to estimate the value add from the defaults. An argument could be made that a default would be only used for the very first, initial deployment and full config would be provided from that point on
very similar ux can be achieved with much simpler to implement and maintain configuration library

Given the above, closing this issue.

elastic / cloud-on-k8s