elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.58k stars 24.63k forks source link

Configure security realms via API #36591

Open pugnascotia opened 5 years ago

pugnascotia commented 5 years ago

I'm interested in whether Elasticsearch could allow security realm configuration via an API. The use-case is avoiding a restart for each configuration change.

cc @AlexP-Elastic in case there's more detail he can provide.

elasticmachine commented 5 years ago

Pinging @elastic/es-security

jkakavas commented 5 years ago

We discussed this today in our team meeting. We would like to understand a little bit more about the requirements and the use cases associated with this request before we see what plan we can possibly come up to address this.

@pugnascotia could you add a little more detail ? Are we talking about adding new realms or making configuration changes to existing enabled ones ? If there are specific configuration options that are more important or will be more frequent used than others, we'd like to know also as these could be satisfied with specific APIs instead.

pugnascotia commented 5 years ago

Ping @AlexP-Elastic who can probably explain it better.

pugnascotia commented 5 years ago

Pinging @daniel-bytes too

daniel-bytes commented 5 years ago

I believe for our use case it would mean adding or modifying realms via an API instead of cluster settings, which would mean basically any available realm setting.

The flow we are looking at now in Cloud is that we have some "authentication profile" APIs (basically security realms + role mappings) which stores data in our Zookeeper data store. If any realm data is changed, we need to kick off a new Plan, which will use the Zookeeper data to inject settings into the cluster configuration for each instance. If role mappings change, we can apply them immediately since they have an API available.

I think the hope here is that, if we could instead apply these settings via an API and not require any cluster reboot then our APIs would be significantly simpler and faster for the end user as they will not have to wait for the plan change to take effect, as well as dealing with any potential issues that can arise from running a plan change to begin with. @AlexP-Elastic might have additional thoughts on this as well.

jordansissel commented 4 years ago

Configuration via API might solve a problem for larger clusters especially on ESS/ECE.

Background: When configuring SAML, we need to configure both Elasticsearch and Kibana. On ESS, no problem, we just put the correct things in "user setting overrides" and save it.

Problem: On ESS, nodes appear to not be reused. Configuration changes cause some kind of "plan" change which makes ESS bring up new replacement nodes and copy the data to the new nodes. The impact of this is that for medium to large clusters, this migration can take longer than Kibana's plan change is willing to wait (20 minutes). A 1-hour plan change (caused by heavy data copying?) causes Kibana's plan change to fail and be rolled back.

User experience: The result is that Elasticsearch is configured on ESS, but Kibana's configuration attempt is aborted.

Complication: This is caused partly by Kibana, on startup, asking Elasticsearch for SAML configuration details, and prior to Elasticsearch being completely configured, Kibana keeps trying(?) and eventually ESS kills the attempt to configure Kibana.

One solution would be to allow realm configuration via API. This would solve the problem because deploying SAML would not require, on ESS, a complete cluster restart (and data migration to new nodes). As a benefit, realms could be deployed nearly instantly instead of waiting for a full cluster restart (on ESS, for tiny clusters, this is at least 5 minutes for a full restart on a plan change).

A separate solution would be to have ESS improve how it deploys configuration such that it did not take hours to change elasticsearch.yml on a large cluster, but I don't know if that's the right solution.

Separately, if there were an API to configure realms, one could automate SAML configuration regardless of the substrate/platform that runs Elasticsearch. For example, on Okta, we create 1 Okta App per Kibana endpoint to configure SAML. At this time, we have no way to deploy SAML to Elasticsearch/Kibana without involving a human copy/pasting into the correct place (ESS, ECE, ECK, self-hosted elasticsearch.yml, etc).

One extra note, is that this API should also involve Kibana, because Kibana, at least for SAML, requires special configuration in its kibana.yml as well. That said, Kibana is out of scope for this particular Elasticsearch issue, but in scope for the problem ;)

pugnascotia commented 4 years ago

As a side note - you can apply config changes on Cloud without data migrations, but you still to restart each node, wait for it to recover, wait for the cluster to settle, then move on to the next node, and can can take a long time in itself.

jordansissel commented 4 years ago

apply config changes on Cloud

How? I was helping @mindbat with some SAML automation and their cluster (~300gb) took quite a while to deploy presumably due to data migration? Maybe it's not due to data migration -- my thought was based on assumption and not based on any significant observation.

pugnascotia commented 4 years ago

There's an option when applying a plan change - you can choose the strategy you want: grow-and-shrink, rolling grow-and-shrink (I think it is), or inline (i.e. don't create new nodes). The UI will disable inline for some changes, but I think they're most topology related (so you can't do an inline plan change if you're changing the node size, for example).

AlexP-Elastic commented 4 years ago

(Note you currently can't select this as an ESS user, only an ESS admin or ECE user - but it's "coming soon" to ESS .. the vast majority of the config change time is currently data migration as you believed, it takes a minutes or so per node to shutdown/bring back up again, and it can take a few minutes to reload all the shards for that instance)

swallez commented 4 years ago

Beyond customer adding their IdP on their clusters, we will also need dymamic addition/removal of security realms in the context of Cloud SSO to allow customers to use their own IdP to login to Elastic Cloud.

Custom IdPs on Cloud is also a strong requirement for FedRAMP. In https://github.com/elastic/cloud/issues/34897: "Support for agencies to bring their own IdP is a federal mandate"

slider commented 2 years ago

Cloud is currently working on a new feature to use standby clusters as a way to provide instantly available deployments to first time users (see the corresponding Meta ticket here: https://github.com/elastic/cloud/issues/81970). We basically "cache" pre-started deployments and re-configure them on the fly when assigning them to a user, eliminating the five minute waiting time that is usually required to start up a deployment from scratch.

The problem is, on Cloud, when a user names their deployment the corresponding deployment alias will also be changed to a user-friendly URL (e.g. my-deployment-ab9164.es.eastus2.azure.elastic-cloud.com instead of ab9164848c504e57a0830b014f2a0c9e.es.eastus2.azure.elastic-cloud.com). When changing the URL, we need to update the SAML configuration for SSO keep working as e.g. the callback URL changes. Currently this requires a cluster restart, which mean a full plan execution on cloud, which takes a couple of minutes. We want to avoid as this negates the previously gained startup speed advantage.

Hence it would be nice if the ES team could look into re-prioritzing this issue as it would enable us to re-configure the SAML settings without a restart.

justincr-elastic commented 2 years ago

Alternative Proposal Elasticsearch should make outgoing API calls to a Config Server, like Spring Cloud's Config Server. It is a cloud proven design.

Don't implement as incoming API calls. A central Config Server is more optimal than each of the 12 Elasticsearch team implementing a different solution to the same underlying problem of config reload.

Duplication Both of these Elasticsearch features seem to want to reload config at runtime without a restart.

Proposal I proposed a different solution in the Desired Nodes doc. Instead of an incoming bespoke API for a specific use case, create an outgoing API that can be reused by all 12 Elasticsearch teams. For example, follow the same design as Spring Cloud's Config Server. Each Elasticsearch team would subscribe to the Config Server to listen for changes, filter on changes in their config subset, and apply changes at runtime if required.

Elasticsearch start and runtime

Operators Spring Cloud's Config Server implements all of these features in a cloud proven architecture:

Examples When an operator makes a change to key/value pairs in the Config Server:

Generalization

justincr-elastic commented 2 years ago

Clarification Incoming or outgoing API will work, but I think it should be generic key/value config settings. The point is I think we should consider reuse of the API for all of Elasticsearch, not just specific areas.

Incoming API Alternative Implement as a simple key/value REST CRUD API.

Different parts of Elasticsearch can subscribe for incoming updates, and filter for different updates. When an update comes in for a specific area, that area of code will react.

This approach is similar to the output API proposal to a Config Server, but eliminates the need for the remote Config Server.