elastic / uptime

This project includes resources and general issue tracking for the Elastic Uptime solution
12 stars 3 forks source link

Handle API keys for the synthetics service #448

Closed shahzad31 closed 2 years ago

shahzad31 commented 2 years ago

Current Implementation:

We are using Elasticsearch API keys in the heartbeat to communicate with Es. We generate one API key when user creates a monitor for the first time, we store that as an encrypted saved object.

Every-time we need to pass monitor configs to the service, we decrypt that saved object and pass the API keys along with es host to the service, which then get passed along to the heartbeat.

Enhancements needed

We need to make sure when the API keys get's deleted, we regenerate that key. We can't regenerate API keys within the task manager, so it will be a user enforced action.

Permissions required

For user to generate API key, they need following permissions

manage_security, manage_api_key, manage_own_api_key

If for example uptime user has write permissions for the Saved objects, which by default uptime:save user will have those. But we can't assign above permissions auto to user, so user needs those keys if they are adding monitors for the first time. In this case, if user doesn't have those permissions, we will throw an error while monitor creation , and guide user about contacting admin for those permissions

Validate API key

We need to write a mechanism to validate API keys periodically, to make sure they remains valid for heartbeat to communicate with es. Once we detect that API keys is outdated, we generate an error state, that will be read by uptime app.

Inform the user

Once API key goes missing or becomes invalid, user will need to perform an explicit action on the UI. Or we decide that if the logged in uptime user is an admin, in that case we auto regenerate the API key, but we still need to inform the user, if for some period their checks weren't running.

Disable monitors?

If they API keys becomes invalid, should we disable monitors? that will give user some indication that monitors are disabled due to some error state. Though this becomes tricky since we can't write to saved objects from the task.

Suits requirements

For the suits implementation we are going to push meta_data file from the heartbeat to the Es, which then should be read by kibana and the task manager will need to create Saved objects(monitors).

This creates complexity that we can't create saved objects from the task itself. So what we will need to do is that as part of suits saved objects monitors, we will also create an API keys.

Create API keys per individual suit

Once users add a monitor of type suit, we will generate an API key with permissions to write saved objects. That key again will get stored as part of the same saved object in encrypted form.

Once the task will receive suit meta_data, we will read respective API keys from the suit saved object, decrypt it and pass it along to service as part of newly created saved object. And yes we will user this API keys to create a new saved object client.

This will allow us to create saved objects from the task for the suits.

Validating

We will need to make sure these API keys remain valid throughout their life time, so we will need to validate them periodically, recycle them when monitors are getting deleted, and regenerate when keys becomes invalid.

I think we will tackle these requirements in few tickets, so i will write follow up implementation tickets.

Standard flow

image

Background task flow

image

Failure modes

i think so far i have been able to identify only two failures mode, one can happen is user explicitly deletes API keys or it can also happens because of failed saved objects migration. Both scenarios are very rare to happen.

Existing example

Alerting team already has implementation that we need for suits to work, when rule get's created in alerting UI, they create an API key and store it as part of rule. That get's used as part of rule execution to create documents.

It get's recycled when rule get's deleted. To handle failure mode, in case API keys goes missing or becomes invalid, alerting will disable rule, and ask user to explicitly enable those, at that point, API key get's regenerated.

Alerting generates a fake kibana request on the server using API key, and that fake kibanan request is used to create a saved object client which in turn will have permissions to write saved objects.

dominiqueclarke commented 2 years ago

@shahzad31 does this mean that in order to use suites, a user has to have permissions manage_security, manage_api_key, manage_own_api_key.

Let's assume we have an admin with API key privileges and an uptime user without API key privileges. Is there a workflow where we can have an admin first "set up" the API key? Or does the API key for permissions to write to saved objects need to be tied to the uptime user?

andrewvc commented 2 years ago

We could always have the service pull the API key from the discovery job and use it in the execution jobs if that's easier for suites I should mention.

paulb-elastic commented 2 years ago

Based on the current implementation, @shahzad31 is considerting defining a new / updated issue to replace this one

shahzad31 commented 2 years ago

This was only valid for suit monitors, concept which is outdated with new push monitors approach so I am going to close this.