Append-only privilege for untrusted endpoints

danhermann commented 3 years ago

In order to grant minimal permissions to untrusted endpoints, we need a privilege that permits append-only indexing, auto-creation of target indices or data streams only if there is an existing template, and prohibits mapping changes.

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

elasticmachine commented 3 years ago

Pinging @elastic/es-security (Team:Security)

ph commented 3 years ago

@danhermann ping us when it's ready for testing.

bytebilly commented 3 years ago

@danhermann have you already considered to use create_doc? In 7.x it allows mapping updates, but it won't anymore in 8.x. For the index creation, is create_index too permissive?

scunningham commented 3 years ago

@danhermann, I remain concerned that auto-creation of data streams or target indices is an issue for untrusted endpoints. Since we have built-in templates for data streams such as "metrics--, logs--, and synthetics--", an attacker with append-only privileges could without restriction create thousands of data streams.

Just to see what would happen, I created an apikey with only "create_doc" and "indices:admin/auto_create" privs. I was surprised to see that the data streams are created, but the indexing failed:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "action [indices:admin/mapping/auto_put] is unauthorized for API key id [1UWzaXcBN6DMbwHTWSLc] of user [elastic] on indices [.ds-logs-attack-003-2021.02.03-000001], this action is granted by the privileges [auto_configure,manage,write,all]"
      }
    ],
    "type" : "security_exception",
    "reason" : "action [indices:admin/mapping/auto_put] is unauthorized for API key id [1UWzaXcBN6DMbwHTWSLc] of user [elastic] on indices [.ds-logs-attack-003-2021.02.03-000001], this action is granted by the privileges [auto_configure,manage,write,all]"
  },
  "status" : 403
}

Elastic log:

│ info [o.e.c.m.MetadataMappingService] [MacBook-Pro.local] [.ds-logs-mydosattack-002-2021.02.03-000001/AcH7vGL1RVC0TdX-3mrDtw] update_mapping [_doc]
   │ info [o.e.c.m.MetadataMappingService] [MacBook-Pro.local] [.ds-logs-attack-002-2021.02.03-000001/RiiKWUPdQBeSgPjgZFGXNg] update_mapping [_doc]
   │ info [o.e.c.m.MetadataCreateIndexService] [MacBook-Pro.local] [.ds-logs-attack-003-2021.02.03-000001] creating index, cause [initialize_data_stream], templates [logs], shards [1]/[1]
   │ info [o.e.c.m.MetadataCreateDataStreamService] [MacBook-Pro.local] adding data stream [logs-attack-003] with write index [.ds-logs-attack-003-2021.02.03-000001] and backing indices []
   │ info [o.e.x.i.IndexLifecycleTransition] [MacBook-Pro.local] moving index [.ds-logs-attack-003-2021.02.03-000001] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [logs]
   │ info [o.e.x.i.IndexLifecycleTransition] [MacBook-Pro.local] moving index [.ds-logs-attack-003-2021.02.03-000001] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] in policy [logs]
   │ info [o.e.x.i.IndexLifecycleTransition] [MacBook-Pro.local] moving index [.ds-logs-attack-003-2021.02.03-000001] from [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-follow-shard-tasks"}] in policy [logs]

@ruflin and I have been discussing a model where Kibana uses the data stream api to pre-create the target data streams before dispatching the policies to the agents. In that scenario, an append-only privilege would be very restrictive: no data stream/index creation, no new mapping. Ie., only add a document if the target exists.

Thoughts?

danhermann commented 3 years ago

@scunningham, it sounds like create_doc is all you need if your data streams are created in advance by something else and they do not have dynamic mappings?

ruflin commented 3 years ago

@danhermann The templates contain dynamic mappings. Some of the templates we could probably remove it but not all of them. I wonder if runtime fields could come to our rescue here. We disable dynamic mapping but because of runtime fields it will still be queryable?

martijnvg commented 3 years ago

Perhaps we just need to introduce a new role that allows dynamic mapping updates or modify the existing create_doc privilege to also grant dynamic mapping updates?

scunningham commented 3 years ago

IMHO, dynamic mappings are too dangerous a privilege to grant to an untrusted endpoint:

An attacker could overwhelm the index with a bunch of bogus mappings intended to prevent new valid mappings from being created, hitting the mapping limits.
An attacker could, if the timing is right, purposely mis-map a field such that subsequent valid documents would fail due to mapping exceptions.

To support only "create_doc", the dynamic mapping would need to be removed from the data stream templates, and all data streams would have to be created before the agents start streaming data.

The built-in index_templates for logs,metrics,synthetics*, etc, all contain a dynamic template:

 "dynamic_templates" : [
              {
                "strings_as_keyword" : {
                  "mapping" : {
                    "ignore_above" : 1024,
                    "type" : "keyword"
                  },
                  "match_mapping_type" : "string"
                }
              }
            ],

Removing dynamic templates to lock down untrusted endpoints would be a significant departure from current behaviors.

tvernum commented 3 years ago

I don't think this is a problem that should be solved primarily through security.

If there are specific endpoints (agent policies?) that should never use dynamic mapping or dynamic index creation then it's reasonable not to grant them those privileges (auto_configure). In fact this is why we made it a explicit privilege for data stream - so that you can have an ingestion key that has create_doc only, and nothing else.

But, if you leave it at that, then it's just setting things up to fail. Within ES security, in general we don't decide whether to do something based on whether it is allowed by security. We attempt to do it because the system is configured to do that thing, and then we fail if security prevents it.
So, if an index/data stream is configured with dynamic templates, and a new field is ingested that matches that template we will attempt to perform a mapping update, and will fail if the user is not allowed to perform auto-mapping updates.

If there is a system configuration that says to do something, and alongside that is a security configuration that says to prevent something, then there is a conflict and it will typically result in an error.

I think what we need to be talking about is how do we configure the system so that these data streams never attempt to perform dynamic mapping changes, and then removing that privilege from the ingestion key is straight forward.

jpountz commented 3 years ago

I'd be curious to get more context around this feature request.

Dynamic mapping updates are needed for some data sources that can't provide a schema up-front, and are also important for the onboarding of new data sources, so I wonder how we plan to make this work if we start disallowing dynamic mapping updates on untrusted endpoints. Will we need to have a concept of trusted endpoints too, and if so, on what criterion would an endpoint be trusted or not?

@ruflin and I have been discussing a model where Kibana uses the data stream api to pre-create the target data streams before dispatching the policies to the agents

I like that it removes the need for untrusted endpoints to create data streams but I wonder how it works in the case of a standalone agent setup. Or do we not foresee a need for doing standalone deployments of untrusted endpoints?

For my understanding, would it be an option to only give privileges to a finite set of data streams to untrusted endpoints to avoid letting them create thousands of data streams the way that @scunningham described?

I wonder if runtime fields could come to our rescue here. We disable dynamic mapping but because of runtime fields it will still be queryable?

Configuring mappings with dynamic:false and defining runtime fields as part of search requests would "work", but this has limitations too e.g. such fields would not be suggested via Kibana so it's unclear how users would learn about them in the first place, and they could be slow to search or aggregate.

scunningham commented 3 years ago

Let me attempt to frame the problem a bit better.

In the field, we expect Fleet Agents to execute in various environments along a broad risk spectrum:

Trusted: Systems behind layered defenses with robust security controls and limited access; high value servers etc.
Untrusted: Systems in largely uncontrolled environments with minimal security controls; think laptops in a coffee shop, undergraduate computer labs, or systems accessible by a rogue employee.

The Fleet system, as implemented today, prioritizes supporting trusted environments. For 7.11, the Fleet implementation (in Kibana at the moment) generates a default Elastic Search api key which it provides to each of its integrations. This default key has broad privileges:

{
    "fleet-output": {
        "cluster": ["monitor"],
        "index": [{
            "names": [
                "logs-*",
                "metrics-*",
                "traces-*",
                ".ds-logs-*",
                "ds-metrics-*",
                "ds-traces-*",
                ".logs-endpoint.diagnostic.collection-*",
                ".ds-.logs-endpoint.diagnostic.collection-*"
            ],
            "privileges": [
                "write",
                "create_index",
                "indices:admin/auto_create"
            ]
        }]
    }
}

What are the types of attacks that are possible per privilege:

write
- Update or delete existing records, effective corrupting any document in any of the wildcarded indices
- Arbitrary dynamic mapping updates; allowing attacker to corrupt or exhaust mappings as previously described in discussion
create_index
- Create any index in the above wild carded namespace, either
  - Starving the system by creating thousands of indices
  - Racing the system to create a known indices and corrupting its mappings, settings, etc.
indices:admin/auto_create
- Similar to create_index; can create arbitrary data streams in the index simply by indexing a document

[Note that normal denial of service attacks are not discussed here. DOS attacks leveraging legitimate operations remain an issue, but are outside the scope of this document. This discussion is limited to attacks that could corrupt data or destabilize the system.]

For 7.12, the privileges have been locked down a bit, however, data stream creation attacks and dynamic mapping attacks are still possible:

 {
    "fleet-output": {
        "cluster": ["monitor"],
        "index": [{
            "names": ["logs-*",
                "metrics-*",
                "traces-*",
                ".logs-endpoint.diagnostic.collection-*"
            ],
            "privileges": [
                "auto_configure",
                "create_doc"
            ]
        }, ],
    },
 }

At the extreme untrusted edge of the risk spectrum, ideally an agent would only have append privileges. This should be the default behavior of the agent in a high risk environment; ie. the system fails closed if additional privileges have not been explicitly granted. This is different from the current behavior, which fails open for our default installation.

However, we do have legitimate cases where an integration may require dynamic mapping and potentially dynamic data stream creation as well. Perhaps in those cases, we can generate a more specific api_token that grants the required permission for a set of fully qualified indices. That would limit the attack surface to the indices that require this functionality. In an environment on the trusted end of the spectrum, this may be an acceptable risk.

Fundamentally, the problem we are trying to address is that there is currently no one security posture that will accommodate the needs of all the applications, as well as provide reasonable defense against known attacks.

ph commented 3 years ago

@mostlyjason @urso @andresrc ^ please have a look.

mostlyjason commented 3 years ago

we can generate a more specific api_token that grants the required permission for a set of fully qualified indices

I imagine a common workflow is that a security operations team tests a monitoring solution in a internal environment first before deploying to an untrusted environment? In this case, the internal environment can fully quality the indices before the untrusted environment sends data. The downside is that its extra steps for the operator to bootstrap those dynamic indices, but this could be seen as a more advanced use case. I'm not sure if rollover indices to initialize with the same dynamic mapping from the prior one? If not, that might be a good addition so it continues working on rollover.

tvernum commented 3 years ago

Perhaps in those cases, we can generate a more specific api_token that grants the required permission for a set of fully qualified indices.

From a least-privilege point of view, that seems wise. Even if we were to solve the mapping & index creation problem described above, there would still be residual risks if we gave untrusted endpoints the ability to append to an unnecessarily wide range of indices

jpountz commented 3 years ago

Thanks @scunningham, this makes sense to me and how we are thinking of trusted vs. untrusted endpoints in particular was helpful. One aspect I'll be interested in is how we know whether an endpoint is trusted or not, e.g. does it require manual action from the user or is it something that can be inferred from the datasets that are enabled on that endpoint?

scunningham commented 3 years ago

@jpountz We've not come up with a way to know from the agent's standpoint whether it is trusted. Fleet would have be told somehow, only the customer can really make that assertation. The reality is that the customer is in a difficult position to make that assessment. Security operators are often dealing with huge populations of endpoints in a very dynamic environment; with new endpoints arriving and old ones dropping off constantly. Defaulting to a trusted mode, and asking the customer to manually take an action when agents should be untrusted is risky. For that reason, in my opinion, fleet agents should be untrusted by default.

We should be able to infer from the integration definitions whether or not an integration is considered "untrusted" and adjust the privileges accordingly. If the customer adds an integration to a policy that requires higher privileges, we should notify the user and ask them explicitly to opt in. The user will be responsible for maintaining the list of agents associated with this policy.

There's a lot of subtlety here around permissions how "risky" they actually are. It is probably a mistake to blanket mark a policy as "untrusted" if, for example, we only add read permissions to a specific innocuous index. We shouldn't underestimate the UX complexity here.

@ruflin has put forth a propsal which I think is a good compromise of least privilege per data stream. I am hopeful that this approach, coupled with pre-creation of all but the dynamically created data streams will mitigate many of the concerns described above. However, we've yet to come up with a solution to the dynamic mapping denial of service attack short of disabling dynamic mapping entirely in the untrusted case.

ruflin commented 3 years ago

If trusted or not should not be dependent on the dataset. The same dataset (data stream) can be used in different context. Lets take a simplified nginx example. In one case, we monitor nginx on an untrusted machine. Because of this, we have append only, no dynamic fields and no creation of data streams as permissions shipped down. This nginx monitoring cannot add any dynamic fields which were not predefined. On the other hand, we monitor nginx services in k8s. There the namespace might be dynamic and also the labels added to each event are dynamic. This requires the Elastic Agent to run in a trusted environment as we ship down more permissions. The resulting data stream for both events could be the same, what is the different is the policy and the permissions on it.

elastic / elasticsearch

Append-only privilege for untrusted endpoints #68414