[DESIGN][Agent] Minimizing Elastic-Agent privileges

andrewvc commented 3 years ago

Action plan after meeting today with @blakerouse @fntlnz and @justinkambic

There are three use cases for elastic-agent with different security requirements, where we can have three different behaviors.

For docker containers specifically, we need a clear path to running as non-root for two reasons:

It will be flagged by many orgs as insecure,
Some software (synthetics) cannot run as root, so we need consistent guidance, today we need to advise people to run as different users for different use cases.

New Behavior by Use Case

Install command on local machine

Keep running as root
Individual beats can downgrade privileges / setuid as needed (see https://github.com/elastic/beats/pull/27878 which does this in just heartbeat as an example)

Run in docker with `docker run`

No need to run as root because we don't run elastic endpoint security, we should recommend running as elastic-agent
We will need to use setcap to add privileges to the elastic-agent binary
Individual beats should downgrade privileges via setcap as needed
If you want to run endpoint then you'll need to run a separate container with

docker run --network agent elastic-agent
docker run --network agent --privileged elastic-endpoint

Run in kubernetes

Run a pod for agent that contains an unprivileged container for elastic-agent, and a privileged container for elastic-endpoint

Tasks:

[ ] Elastic-agent docs updated to recommend running as regular user
[ ] Use setcap in elastic-agent docker container to add all required capabilities as inheritable so subprocesses can use privs
[ ] Modify individual beats to setuid / setcap/ downgrade for the local machine use case
- [ ] Use setcap in subprocesses in container to drop unneeded privileges

elasticmachine commented 3 years ago

Pinging @elastic/agent (Team:Agent)

andrewvc commented 3 years ago

I believe that in a k8s environment hostPath volumes still present a problem. See https://github.com/elastic/beats/issues/19600 . @jsoriano can you add your thoughts here?

jsoriano commented 2 years ago

No need to run as root because we don't run elastic endpoint security

Is this issue focused on Uptime?

In any case I think this is a risky assumption, a user of Elastic Agent for any use-case may decide to install a different integration in the future that may need further privileges, if they do, they will probably find weird failures, and they will end up having to replace their installation of Elastic Agent, or run multiple of them, what may undermine the user experience intended with Agent/Fleet.

The default experience should assume that Agent can run any integration. As a process supervisor, it should be understandable that its default is running full privileged. There can be options to run with less privileges, and we should document them, but we have to think on this as unified user experience, considering what happens if a user associates a policy with an agent that doesn't have privileges to run it.

4. If you want to run endpoint then you'll need to run a separate container with
docker run --network agent elastic-agent
docker run --network agent --privileged elastic-endpoint

This would also undermine the experience intended with Agent/Fleet. What is the benefit of this new experience if you still need to run agents individually?

Individual beats should downgrade privileges via setcap as needed

Modify individual beats to setuid / setcap/ downgrade for the local machine use case

I consider this a good practice for any application, but I think it'd be better if we don't rely on this to ensure the minimum privileges principle. I would propose a security model where Elastic Agent has the control of the privileges of the processes executed. The main reasons for that:

Elastic Agent may execute processes of different nature, what will require different implementations for capabilities management, what is error-prone. Think that Agent already runs Beats and Endpoint, and may run other different collectors in the future. Running all of them with full privileges, trusting that they will do the right thing after that is a risk.
This is a common practice (docker and other container runtimes run containers by default with a reduced set of capabilities, execution can be tuned to increase privileges, systemd and other service supervisors have features to control the capabilities of the services they run...).
This can allow in the future to decide the capabilities required per enabled integration, for example metricbeat with the system module enabled is executed with more capabilities than metricbeat monitoring only a remote apache.
As well as controlling privileges, it could also run collectors as different users, solving the mentioned problem with synthetics.
When running with reduced privileges, Elastic Agent may inform Fleet of its capabilities so it can give feedback to the user about the available options to run more privileged integrations. Or it can reject the execution of a policy if it doesn't have enough privileges, providing meaningful guidance to the user at the moment of trying to associate the policy (instead of blindly running it till something fails, and then having to investigate through logs and so on).

This model would be based on:

Elastic Agent runs any collector by default with a reduced set of capabilities.
Any collector (or integration in the future?) may override these defaults with configuration in their spec.
As a good practice, collectors may still further downgrade their privileges if wanted, but not required.

I believe that in a k8s environment hostPath volumes still present a problem. See elastic/beats#19600 . @jsoriano can you add your thoughts here?

In some restricted k8s environments hostPath cannot be used. This is a problem with use cases where you want to persist state between executions or after upgrades. This is specially important for filebeat, probably not so much for heartbeat. Solutions for this are not straight-forward, they will depend on the available volume providers in the environment.

andrewvc commented 2 years ago

All good points @jsoriano, however, one concern @joshbressers has had is that users may be reluctant or unable to run the docker container as root, esp. in large environments with strict security policies. I'd argue that elastic-agent is less akin to systemd or another "process supervisor" in that context, it's simply the user app to be run.

WRT how the processes are invoked, I agree it'd be nice to have elastic-agent do it instead of the processes themselves. Another model could be just using the setcap command to set capabilities on the filesystem for the respective binaries, we could do that at build time if https://github.com/elastic/beats/issues/27651 were implemented.

jsoriano commented 2 years ago

I'd argue that elastic-agent is less akin to systemd or another "process supervisor" in that context, it's simply the user app to be run.

Yes, you are right, Agent being a process supervisor is an implementation detail, nothing that a user can see as a reason to have more privileges. Still, I think we have to count with users configuring integrations that require more privileges than the ones given to the Agents.

Another model could be just using the setcap command to set capabilities on the filesystem for the respective binaries, we could do that at build time if elastic/beats#27651 were implemented.

Yes, this could be a good idea in any case.

andrewvc commented 2 years ago

I think for now, given the valid concerns @jsoriano has raised, let's proceed with merging https://github.com/elastic/beats/pull/27878 , and postpone future work for now. That solves the use cases we need on our team, and we probably don't have the bandwidth for a larger scale fix at this point.

marclop commented 2 years ago

I'm taking a look at having the apm-server not run as root when the elastic-agent is run as root and what our options are. We seem to have decided to not manage the user/group for binaries that are run by the elastic-agent and have the beats themselves change their user/group and set capabilities.

I would like us to revisit that decision, ideally allowing beats to specify which user:group they would like to be run as, instead of requiring each individual beat to implement the logic that heartbeat currently has to change its user:group and optionally set specific capabilities.

Ideally, the elastic-agent should allow beats to specify the user:group that it should be run as, as well as any additional capabilities that the beat requires in order to run successfully:

name: APM-Server
cmd: apm-server
artifact: apm-server
...
user: elastic-agent
group: elastic-agent
# APM server doesn't require any additional capabilities, but they could be specified as:
# linux_capabilities: 'cap_net_raw+ep'

Another option would be to recommend that the elastic-agent be run with an unprivileged user, I see the issue has a bullet point to update the documentation to recommend elastic-agent be run with an unprivileged user, are here any blockers to update the docs / references to recommend using a regular user?

elasticmachine commented 2 years ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

jsoriano commented 2 years ago

We seem to have decided to not manage the user/group for binaries that are run by the elastic-agent and have the beats themselves change their user/group and set capabilities.

I am not sure if there has been an active decision on this after this issue was opened. This is only the way it currently works.

I would like us to revisit that decision, ideally allowing beats to specify which user:group they would like to be run as, instead of requiring each individual beat to implement the logic that heartbeat currently has to change its user:group and optionally set specific capabilities.

+1 to this, this would be in line of my proposal in elastic/elastic-agent#147, where Elastic Agent controls the privileges, based on info given on each collector spec. I don't think that an approach like this one has been discarded, only that it would need more work.

jlind23 commented 2 years ago

@ruflin seems to be a requirement to consider for the V2 design you are doing.

ruflin commented 2 years ago

@jlind23 I added a note to the design doc to dig into it.

eedugon commented 2 years ago

@jsoriano , please take in mind that the current elastic-agent docker image (7.16.2) is adding the elastic-agent user to the root group and the main directory (elastic-agent) is owned by root:root without permissions to anyone.

In platforms like azure containers our image doesn't work at all because of security restrictions (elastic-agent user will NOT belong to root group hence it won't have permissions to see any of the content of the elastic-agent directory).

The following small change solves the problem:

FROM docker.elastic.co/beats/elastic-agent:7.16.2
USER root
RUN chown -R :elastic-agent /usr/share/elastic-agent
USER elastic-agent

The previous just changes the group ownership of the elastic-agent directory and all its content to the elastic-agent group. Then, in the hypothetical case of the elastic-agent user not belonging to root group at least it will have access to the content of the directory to run the agent.

At the moment we are not running as root but adding the non-root user to root group, which looks weird.

jlind23 commented 2 years ago

@ph This is something we may consider to avoid having issues on cloud container solutions such as azure containers..

jsoriano commented 2 years ago

@eedugon these changes to add files and users to the root user group were done in the context of supporting OpenShift guidelines, you can read more about this in https://github.com/elastic/beats/pull/12905 (reverted and reapplied in https://github.com/elastic/beats/pull/18873).

If we change this to support Azure, we have to check that we keep supporting these OpensShift guidelines.

jlind23 commented 2 years ago

@blakerouse @ruflin what is your opinion here? Any particular path we should take?

ph commented 2 years ago

If I understand the guideline, making that change will be incompatible with openshift.

For an image to support running as an arbitrary user, directories and files that are written to by processes in the image must be owned by the root group and be read/writable by that group. Files to be executed must also have group execute permissions.

From: https://docs.openshift.com/container-platform/4.9/openshift_images/create-images.html

eedugon commented 2 years ago

Thanks Jaime!

root group membership isn’t that important, just with a directory ownership change the image would work in Azure (although depending on the openship requirements I don’t know if that would break openshift compatibility).

What looks weird to me is trying to run our software with “non root” users but adding the user to the root group.

Anyway I’m totally ok with any decision you take here, but it would be great to add in the docs the container environments that we verify or support.

El El jue, 27 ene 2022 a las 20:48, Pier-Hugues Pellerin < @.***> escribió:

If I understand the guideline, making that change will be incompatible with openshift.

For an image to support running as an arbitrary user, directories and files that may be written to by processes in the image should be owned by the root group and be read/writable by that group. Files to be executed should also have group execute permissions.

From: https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html

— Reply to this email directly, view it on GitHub <elastic/elastic-agent#147>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBFXJLA32MJEF7FSD7YIVDUYGOP5ANCNFSM5EAEATFQ . You are receiving this because you were mentioned.Message ID: @.***>

eedugon commented 2 years ago

You are right Pier-Hugues, sorry I hadn’t read your post.

So clearly my workaround is against openshift, and the reason for the user to be on “root” group is probably beyond my understanding.

Sorry for the noise here!

The openshift guideline also explains:

Because the container user is always a member of the root group, the container user can read and write these files.

I don’t know if that’s generic on Linux dockers or it’s just an openshift proposal, just looked weird from sysadmin and security point of view.

El El jue, 27 ene 2022 a las 20:54, Edu Gonzalez de la Herran < @.***> escribió:

Thanks Jaime!

root group membership isn’t that important, just with a directory ownership change the image would work in Azure (although depending on the openship requirements I don’t know if that would break openshift compatibility).

What looks weird to me is trying to run our software with “non root” users but adding the user to the root group.

Anyway I’m totally ok with any decision you take here, but it would be great to add in the docs the container environments that we verify or support.

El El jue, 27 ene 2022 a las 20:48, Pier-Hugues Pellerin < @.***> escribió:

If I understand the guideline, making that change will be incompatible with openshift.

For an image to support running as an arbitrary user, directories and files that may be written to by processes in the image should be owned by the root group and be read/writable by that group. Files to be executed should also have group execute permissions.

From: https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html

— Reply to this email directly, view it on GitHub <elastic/elastic-agent#147>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBFXJLA32MJEF7FSD7YIVDUYGOP5ANCNFSM5EAEATFQ . You are receiving this because you were mentioned.Message ID: @.***>

jsoriano commented 2 years ago

Because the container user is always a member of the root group, the container user can read and write these files.

I don’t know if that’s generic on Linux dockers or it’s just an openshift proposal, just looked weird from sysadmin and security point of view.

Yes, this seems to be the case for containers started with Docker with arbitrary uids:

$ docker run -it --rm -u 1000 ubuntu:20.04 id
uid=1000 gid=0(root) groups=0(root)
$ docker run -it --rm -u 1000 alpine id
uid=1000 gid=0(root)

And yes, this effectively allows to access (mounted) host files with permissions for the root (0) group.

What I think that OpenShift additionaly does is to use user namespacing, this way the id 0 in the container belongs to a random unprivileged user and group in the host. (Update, more info about this: https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids, https://cookbook.openshift.org/users-and-role-based-access-control/why-do-my-applications-run-as-a-random-user-id.html)

ph commented 2 years ago

@jsoriano Is that correct to believe that we might need to have a different docker images for the azure case?

jsoriano commented 2 years ago

@jsoriano Is that correct to believe that we might need to have a different docker images for the azure case?

Yes, it may be possible that we need an specific image for Azure if their runtime is different enough. We would need to investigate a bit more.

jlind23 commented 2 years ago

@ph first thing to do will be to have a single config running on both openshift and azure container, and if it's not working then we should consider shipping a specific azure image which i will definitely try to avoid. Something we should investigate in one of our coming release.

nicpenning commented 1 year ago

Is this FR / issue still alive?

As a user, I would like to be able to set which user context each integration executes as.

For example, we can run Filebeat today as a service on Windows with a specific user to access files and folders that cannot be accessed by system. This is a slight blocker for us to migrate a few different integrations.

A work around is deploying an agent locally to said systems but we would prefer to use network mapped drives (even though discouraged, this works very well) to reduce overhead on the servers themselves and have less agents to manage.

Also, it's best to have reduced permissions anyways, especially when you are simply reading log files and forwarding them on to another resource.

Please do let me know if this concept is worth considering here or a new issue/FR makes sense.

Thanks!

jlind23 commented 3 months ago

Elastic Agent can now be run as non root on Linux, Mac and Windows hence closing this as done. cc @ycombinator @nimarezainia

elastic / elastic-agent