Cannot use Kubernetes labels in variable substitutions unless the labels are consistent across the set of monitored pods

cmacknz commented 1 month ago

Relates https://github.com/elastic/elastic-agent/issues/2261

If agent is monitoring a set of Kubernetes pods, then ${kubernetes.labels.*} variable substitutions cannot be used in the agent policy unless the complete set of labels used as variables exists together for at least one pod in the cluster. That is, you can only use a set of kubernetes.labels.* combinations that all exist together for at least one pod. Any other combinations will currently lead to the input being silently dropped from the configuration because of https://github.com/elastic/elastic-agent/issues/2261.

This is limiting when the set of used Kubernetes labels is not actually consistent in a given cluster. A user would have to know to duplicate the inputs for each unique set of labels.

For example let's imagine a user wants to use labels in processor definitions and the labels kubernetes.labels.service, kubernetes.labels.app, and kubernetes.labels.k8s-app exist.

In this cluster, the labels kubernetes.labels.service and kubernetes.labels.app exist together for some pods so the following configuration works:

          - if:
              has_fields: ['kubernetes.labels.service']
            then:
              - add_fields:
                  target: service
                  fields:
                    type: name
                    name: "${kubernetes.labels.service}"
            else:
              - add_fields:
                  target: service
                  fields:
                    type: name
                    name: "${kubernetes.labels.app}"

The labels kubernetes.labels.app and kubernetes.labels.k8s-app are mutually exclusive of each other, so the following configuration causes the input to be silently dropped from the policy because of https://github.com/elastic/elastic-agent/issues/2261. Fixing https://github.com/elastic/elastic-agent/issues/2261 likely will lead to the input being in an error state, which is more obvious, but not easy to work with.

          - if:
              has_fields: ['kubernetes.labels.service']
            then:
              - add_fields:
                  target: service
                  fields:
                    type: name
                    name: "${kubernetes.labels.service}"
            else:
              - add_fields:
                  target: service
                  fields:
                    type: name
                    name: "${kubernetes.labels.k8s-app}"

We should come up with a better way to handle this situation than requiring the user to manually determine which combinations of labels exists together. The agent knows the full set of variables available for each pod, it can figure this out on behalf of the user.

A core problem is that in this example the agent does not look inside the input configurations, it can only add or remove inputs. It could not for example remove processor definitions that do not exist given this is a Beat input configuration.

A representative set of diagnostics is attached showing inputs being dropped from the policy in computed-config.yml along with the complete set of labels and pods available in variables.yml.

elastic-agent-diagnostics-2024-05-28T09-09-12Z-00.zip

elasticmachine commented 1 month ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

blakerouse commented 2 weeks ago

Variable substitution is done this way on purpose. This is easily solved by providing a fallback value.

${kubernetes.labels.k8s-app|''} see the addition of |'' as a way of providing a fallback value. When a fallback is not provided then it will be removed, but when a fallback is provided if the variable doesn't have a value it uses the fallback and doesn't remove the input.

https://www.elastic.co/guide/en/fleet/current/dynamic-input-configuration.html#_alternative_variables_and_constants

elastic / elastic-agent

Cannot use Kubernetes labels in variable substitutions unless the labels are consistent across the set of monitored pods #4823