elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
21 stars 144 forks source link

osquery won't install when deployed via Elastic Agent integrations on k8s #1540

Closed ty-elastic closed 3 months ago

ty-elastic commented 2 years ago

Hi, I deployed Endpoint and Agent 8.4.x into the daemonset on my self-managed k8s cluster per this yaml, and then deployed osquery via Fleet Integrations.

Upon deployment, I see this error in the agent logs:

06:25:26.900
elastic_agent.osquerybeat
[elastic_agent.osquerybeat][error] Failed to run osquery:W1005 11:24:26.929394  7382 extensions.cpp:426] Will not autoload extension with unsafe directory permissions: /usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/osquerybeat-8.4.2-linux-x86_64/osquery-extension.ext
E1005 11:24:26.953727  7382 shutdown.cpp:79] Cannot activate osq_config config plugin: Unknown registry plugin: osq_config: exit status 78
06:25:26.900
elastic_agent.osquerybeat
[elastic_agent.osquerybeat][info] osquerybeat context cancelled, exiting

this suggests that osquery extensions must not have write permissions for non-privileged accounts.

Yet after install, if I ssh into the daemonset I see this:

# pwd
/usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/osquerybeat-8.4.2-linux-x86_64
# ls -l
total 431832
-rw-r--r-- 1 elastic-agent elastic-agent     13675 Sep 13 21:23 LICENSE.txt
-rw-r--r-- 1 elastic-agent elastic-agent   2571228 Sep 13 21:23 NOTICE.txt
-rw-r--r-- 1 elastic-agent elastic-agent       828 Sep 13 21:58 README.md
drwxr-xr-x 2 elastic-agent elastic-agent        23 Sep 14 22:38 certs
-rw-r--r-- 1 root          root             389399 Sep 13 21:50 fields.yml
drwxr-x--- 3 root          root                 88 Oct  5 19:30 osquery
-rwxr-xr-x 1 elastic-agent elastic-agent   6173182 Sep 13 21:58 osquery-extension.ext
-rwxr-xr-x 1 elastic-agent elastic-agent 219834144 Sep 13 21:57 osquerybeat
-rw-r--r-- 1 root          root              43600 Sep 13 21:50 osquerybeat.reference.yml
-rw-r--r-- 1 root          root               6504 Sep 13 21:50 osquerybeat.yml
-rwxr-x--- 1 elastic-agent elastic-agent 213141464 Sep 13 21:50 osqueryd

elastic-agent has write priv to osquery-extension.ext which is triggering that error.

If I chown root:root osquery-extension.ext in the elastic agent container in the daemonset, osquery works as expected.

Seems like osquery-extension.ext needs to somehow be owned by root when installed into k8s daemonset via Agent via Integrations?

cmacknz commented 2 years ago

From @michalpristas initial look at this problem:

uid and gid should be configurable in the agent spec but assuming nobody is using this it get effective UID and GID from agent. when agent is unpacked, all permissions are set to 0,0 so everything should be owned as root. it can be that in case agent is running under elastic-agent user permissions to unpacked (beats) files are somehow messed up as there owner of a file and user it's running as are not aligned. changing owner of unpacked installations to 0,0 seems like a proper thing to do

botelastic[bot] commented 2 years ago

This issue doesn't have a Team:<team> label.

cmacknz commented 2 years ago

We have another report of something similar happening with the files used for Beat lightweight modules when run on k8s, producing log messages like:

{"log.level":"error","@timestamp":"2022-10-14T17:03:00.903Z","log.logger":"registry.lightmodules","log.origin":{"file.name":"mb/lightmodules.go","file.line":147},"message":"Failed to list light metricsets for module uwsgi: getting metricsets for module 'uwsgi': loading light module 'uwsgi' definition: loading module configuration from '/usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/metricbeat-8.4.2-linux-x86_64/module/uwsgi/module.yml': config file (\"/usr/share/elastic-agent/data/elastic-agent-d3eb3e/install/metricbeat-8.4.2-linux-x86_64/module/uwsgi/module.yml\") must be owned by the user identifier (uid=0) or root","service.name":"metricbeat","ecs.version":"1.6.0"}
cmacknz commented 2 years ago

Bumping this from 8.7 to 8.6

jmbass commented 2 years ago

I worked around this by adding a postStart hook to my daemonset and chowning osquerybeats.

gizas commented 2 years ago

@jmbass can you please provide the postStart sample you used ? We might need to provide this workaround to a customer of ours

jmbass commented 2 years ago

@gizas On the daemonset/deployment template spec:

...
containers:
- name: elastic-agent
  image: docker.elastic.co/beats/elastic-agent:8.4.3
  lifecycle:
    postStart:
      exec:
        command: ["/bin/bash", "-c", "chown -R root:root /usr/share/elastic-agent/data/*/install/osquerybeat-8.4.3-linux-x86_64/"]
etc...

Be wary to use the right osquerybeat version for your elastic agent in the command.

ty-elastic commented 1 year ago

for the postStart hook, you could wildcard the path (presumably there would only be one version in that directory?). notably, this does require a restart of the elastic-agent container after osquerybeat install. I guess if you wanted to be really hacky, you could try to have this run periodically regardless of whether elastic-agent is installed.

ThorbenJ commented 4 months ago

Just doing a bit our DYI sleuthing:

Using postStart has no timing guarantees, agent will have already started potentially osquery will have also already started and died before the postStart hook is scheduled to run.

A more robust work around would be to add a new startup/entrypoint script via a configmap, to correct the file permissions before agent has the chance to start:

ConfigMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-k8s-scripts
data:
  pre-entrypoint.sh: |
    #!/bin/sh
    chown -R root:root /usr/share/elastic-agent
    exec /usr/local/bin/docker-entrypoint "$@"

Now edit agents mounts to include the config map:

          ...
          volumeMounts:
           ...
            - name: extra-scripts
              mountPath: /var/local/pre-entrypoint.sh
              subPath: pre-entrypoint.sh
              readOnly: true
      volumes:
       ...
        - name: extra-scripts
          configMap:
            name: elastic-agent-k8s-scripts
            defaultMode: 0754
       ...

Then change the container start command:

     ...
     containers:
        - name: elastic-agent-k8s
          image: docker.elastic.co/beats/elastic-agent:8.12.2
          command: ['/var/local/pre-entrypoint.sh']
      ...

The long term fix, I think, might be for the container to be built using the owner UID:0 to match how elastic-agent will run.

aleksmaus commented 4 months ago

Just to document some of the DM conversations:

It looks like there is some miscommunication between the teams. Some are assuming that the agent always runs under unprivileged user in k8s. The others, namely Security, require the the agent running as root. The current agent files have a mix of owners: elastic-agent and root. The *.yml files are specifically set to be owned by root. https://github.com/elastic/elastic-agent/blob/main/dev-tools/packaging/templates/docker/Dockerfile.elastic-agent.tmpl#L148

It looks like we need some consistent approach and the story for our users. And if we are to support both scenarios, this should be documented, since, as of now, the instructions on Kibana will lead to broken osquery integration.

Screenshot 2024-07-09 at 9 22 55 AM
cmacknz commented 4 months ago

This will be resolved by https://github.com/elastic/elastic-agent/pull/4925