[Enhancement]: Enforce Agent Tamper Protection when enrolling agents using the Force flag

Doserdog commented 8 months ago

Describe the enhancement: When Agent Tamper Protection is enabled on an agent policy, ensure a new agent can not be enrolled on a system by using Local Admin and the Force flag without the proper security token. Describe a specific use case for the enhancement or feature: Currently, if an agent is installed on a host, and utilizing a policy with Agent Tamper Protection enabled, an uninstall-token must be used to run the "uninstall" command successfully. However, if you have local admin you are able to enroll a new agent over-top the installed one using the Force flag (-f) without any token. This enhancement would require the use of a security token when enrolling an elastic-agent on a host using the Force flag when an agent is already installed AND utilizing Agent Tamper Protection. What is the definition of done? A host with elastic-agent installed and using an agent-policy with Agent Tamper Protection enabled is unable to have another agent installed over the current agent directory using the Force flag on install.

pierrehilbert commented 8 months ago

@aleksmaus any idea of the feasibility here?

aleksmaus commented 8 months ago

@aleksmaus any idea of the feasibility here?

Currently there is nothing in the agent that validates the uninstall token or the signed policy settings. All the validations and protection enforcement is done by the Endpoint. All the agent-side policy signature checks and validations were disabled/commented in the agent. Agent can't "trust" it's own policy protected configuration, until it start validating it's signature. It completely relies on Endpoint for the tamper protection at the moment.

As it currently stands it looks like the install/enroll with the force flag would have to be modified to somehow interrogate Endpoint in order to figure out if it's currently in the tamper protected state or not. In order to do that the agent would have to check it's current configuration/policy and if Endpoint is enabled, and use Endpoint spec command to check the token validity. The only Endpoint command that checks the token validity at the moment is the uninstall command. Maybe Endpoint check command can be "extended" for this specific purpose or the uninstall can be extended for this particular case to perform something like a "dry" run. In both cases this would need to be coordinated with Endpoint developers.

@intxgo was developing tamper protection feature support on Endpoint side and might be able to chime in on what's involved in this change from Endpoint point of view.

intxgo commented 8 months ago

Looks related to https://github.com/elastic/security-team/issues/8155, as long as Agent can be easily stopped or removed we're having a weak protection as the admin won't even be able to use response actions console to fix the issue with Agent. The obvious is that a security product should control non-security product, i.e. Endpoint should control Agent, not the other way around... but that's another discussion.

I agree that we're still having issues on Agent side to solve for Tamper Protection. Perhaps Agent shouldn't decide on it's own to respond to stop command from service control manager unless Endpoint agrees. However I guess that Agent's service registry config and config.yaml can be altered by local admin too, so effectively everything (Agent side) can be bypassed at the next reboot.

pierrehilbert commented 8 months ago

Thx @aleksmaus & @intxgo.

What would be the best option if we want to limit the current gap? Add a check at Agent level to validate with Endpoint first in case its running with Tamper Protection enabled?

intxgo commented 8 months ago

That's a good question, Endpoint doesn't have enroll command or equivalent, but if it's installed with Tamper Protection it'll refuse uninstallation without uninstall-token.

Agent should try to do endpoint-security uninstall, if it succeeds, then all's fine, otherwise abort. Tamper Protected Endpoint should return (on POSIX both are possible depending on execution, due to 255 wrap in bash)

  Error_InvalidUninstallTokenPosix = -28,
  Error_InvalidUninstallToken = -284,

Actually, Agent should never try to proceed if it cannot uninstall Endpoint as the currently running Endpoint may have different policy signature verification key, so later it won't accept any action nor policy from the new Agent. Even if the uninstallation fails because of a different reason, we should solve the issue first to clean the machine before proceeding with new Agent install.

aleksmaus commented 8 months ago

Actually, Agent should never try to proceed if it cannot uninstall Endpoint as the currently running Endpoint may have different policy signature verification key, so later it won't accept any action nor policy from the new Agent.

Hmm, this is an interesting point. Would require a change on how the agent is installed/enrolled. Currently the agent doesn't uninstall Endpoint if it's installed or enrolled. The agent could be enrolled into the same policy again, but it could be a different policy. There will be a little potential data loss for the time when the endpoint is reinstalled.

So if this slight change in behavior is acceptable then upon enrollment:

Agent attempts to uninstall Endpoint
If uninstall fails with the known return codes show the error

I would have to dig through the agent code a bit in order to be able to accurately estimate the level of efforts or just add this to my TODO list. How soon this need to be done?

cmacknz commented 8 months ago

The agent could be enrolled into the same policy again, but it could be a different policy. There will be a little potential data loss for the time when the endpoint is reinstalled.

I think if the endpoint input ID in the policy changes we'll end up uninstall and re-install endpoint even if endpoint is in both the old and new policy. The input ID is usually the package policy ID for endpoint:

      id: 8b11acb6-0aa9-4df6-812c-b25f6c675e85
      meta:
        package:
            name: endpoint
            version: 8.12.0
      package_policy_id: 8b11acb6-0aa9-4df6-812c-b25f6c675e85

The problem now is that the uninstall of endpoint is only attempted once the first policy change action from the new policy is received. This is completely disconnected from the current enroll logic, we would need to implicitly treat an enrollment as equivalent to an uninstall attempt when endpoint is in the policy.

intxgo commented 8 months ago

The above stated problem seems to address only the policy signing change, but not the Tamper Protection. To recap, when Endpoint is Tamper Protected it'll refuse to be uninstalled without the uninstall-token.

aleksmaus commented 8 months ago

Endpoint is Tamper Protected it'll refuse to be uninstalled without the uninstall-token

Or without signed material as a part of the action or the policy as far as I remember. So it was either the uninstall token or the signed material.

elastic / elastic-agent

[Enhancement]: Enforce Agent Tamper Protection when enrolling agents using the Force flag #4349