elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.64k stars 8.23k forks source link

[Personalisation] Notifications API MVP #140743

Closed gsoldevila closed 2 years ago

gsoldevila commented 2 years ago

Overview

In the scope of personalisation & collaboration, we need a way to notify users when they are linked to a certain entity / event:

The personalisation & collaboration initiative is likely going to have multiple use cases impacting different solutions and plugins. The goal of this issue is to describe an API that will act as a centralised place for all these use cases to send notifications.

image

The Notifications API will rely on 2 basic building blocks:

Notification flow (draft)

Here's the sequence diagram describing the Email notification flow (MVP):

image

Q & A

Notifications API specification

interface NotificationsServiceSetup {
  registerNotificationType(plugin: string, type: string, autoOptIn: boolean);
}

interface NotificationsServiceStart {
  sendPlainTextNotification(payload: PlainTextNotification): Promise<boolean>;
}

interface AbstractNotification {
  recipient: string; // userID
  plugin: string;
  type: string; 
}

interface PlainTextNotification extends AbstractNotification {
  subject?: string;
  message: string;
}
elasticmachine commented 2 years ago

Pinging @elastic/kibana-core (Team:Core)

elasticmachine commented 2 years ago

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

lukeelmers commented 2 years ago

Thanks for getting this started! Overall I'm particularly interested in how we can prevent connector-specific concepts from leaking into the API, since the idea is that we will eventually have a longer term platform-provided concept of connectors.

With that in mind, a few initial thoughts:

Do we want to support multi-person notifications?

I'm unsure if the benefits we get from a groupIfPossible option are worth the added complexity of providing it up front. Is it strictly needed at this point? If not, I'd vote we wait until we have a better understanding of the long-term architecture of the notifications/connectors system, and layer it in as a future enhancement. Always easier to add something later than to take it away...

the NotificationsService could expose a couple of methods to read and update the notification preferences for the current user.

I'd imagine that eventually we'd want that dependency to be inverted, so that there's a user-settings-service (perhaps similar to uiSettings), that the Notifications plugin would register these settings into and read from. Plus we might run into some circular dependency issues -- if Notifications need to read user profile to get settings from Security, Security won't be able to update preferences if it lives in Notifications.

How do we let administrators (or users) know that notifications will NOT work because they have not been properly configured?

Maybe we could start by just returning an error from the Notifications service so that the consuming application (Cases) could decide what to do with it? E.g. maybe they want to throw a toast notification or render an inline message mentioning that users have not been notified? idk...

Can new notification types be dynamically added by other plugins? Should the API support dynamic messages or only prebuilt ones? Do we want to support using different connectors for each notification type?

I'd vote 'no' on all of these items for MVP/Phase1 if we don't have a strong need for them 🙂 That way we keep it as simple as possible so we aren't locking ourselves into too many decisions before @elastic/kibana-app-services starts getting more involved.

cnasikas commented 2 years ago

Maybe we could start by just returning an error from the Notifications service so that the consuming application (Cases) could decide what to do with it? E.g. maybe they want to throw a toast notification or render an inline message mentioning that users have not been notified? idk...

I think that the process of notifying should not block HTTP requests. A case should be created/updated even if the user is not notified (error while sending an email) or notifications are not configured. I think the error should surface somewhere else. Ideally, in the admin notification UI page if there is one. Cases could also show a callout on the all cases page informing the user that notifications are not properly configured and cases cannot notify assignees. This would require the notification plugin (or the plugin responsible for the notification configuration) to expose from the start method of the plugin side of the plugin a boolean where cases can check and act accordingly.

pgayvallet commented 2 years ago

my initial remarks and comments (note: I did not look at @lukeelmers's reply to not be influenced)

QUESTION: Do we want to support multi-person notifications?

For the initial MVP of the revamped NC, idk. For the old/initial purpose of the notification service, this was a mandatory requirement.

Now, it was not about using, per example, slack's MPIM (which is somehow a tight coupling between a notification and its associated action, but I'll get back on that), just to have an API to send a given notification to multiple users to have the grouping/batching logic handled internally (which can be mandatory for some use case, e.g sending a mail to a large group of users)

QUESTION: Will the Notifications API be a part of the Core services?

Given the suggested architecture, I agree that it can only be a plugin.

Now I have to ask: are we sure we'll never need to use this API from Core services at some point? (and if we do, do we know how we will technically be able to do so? IOC with registration API as we do for telemetry collectors?)

QUESTION: Case assignment might be the first use case, but the roadmap foresees multiple notification types in the future. Where do we store the list of notification types? ANSWER: It would seem that this knowledge should be stored somewhere within the Notifications API plugin. The NotificationsService will need that knowledge in its logic.

Why not allow consumers to register their notification types, as it's done for EBT for example? It seems like the proper isolation of concerns, and would avoid problem in code ownership where other contributors need to update the notification plugin to add their new types?

QUESTION: Rich text formats. What formatting options will the notifications API support? Is it acceptable to send notifications in plain text?

The problem with any enhanced text format is that is introduce a tight coupling between the notification and the connector. E.g depending if a given notification will be sent via mail or to slack, the actual formatting capabilities aren't the same.

QUESTION: Should the API support localised notification messages? ANSWER:[...] we’d like the API future-proof, keeping localisation in mind.

Meaning what in term of impact on the design/API surface, exactly?

Note that this highly depends on the next point,

QUESTION: Should the API support dynamic messages or only prebuilt ones?

Excellent question.

Supporting only prebuild messages would allow to somehow solve the RTF/i18n problematic, as it would be handled internally by the notification API (you can have a formatter per connector for instance). Now, I suspect that in practice, it may not be sufficient (or would force to introduce hundreds of notification types very quickly), but I lack the proper vision to understand exactly what we're planning to do long term.

QUESTION: How do we determine the connector to be used for each notification channel?

Also something that should absolutely be decided in the initial design. More globally, what's the relations between a notification (the data) and a notification action (sending it to a channel / though a connector)? Is that static / predetermined? configurable? per user? overridable per notification instance?

QUESTION: Do we want to support using different connectors for each notification type?

Maybe I don't understand, but if we don't, we're bound to have a single connector for all and every notification types, don't we?

gsoldevila commented 2 years ago

There are multiple opinions regarding multi-person notifications. Initially, I added it cause I saw it as a feature that could be interesting from a functional standpoint, and also because we have it "for free" with the current Email connector. However, it would increase the complexity to add a Slack connector in the future, if we are to honour the flag (meaning we would have to enhance the connector to support MPIM notifications).

Also, seeing as though this parameter will likely not change for each "notification action", I believe it's probably better to add it at a later phase and to have it as a configuration param. I'll update the API accordingly.


Plus we might run into some circular dependency issues -- if Notifications need to read user profile to get settings from Security, Security won't be able to update preferences if it lives in Notifications.

@luke I was imagining it more as a layer on top of Security:

User Profile UI => Notifications Service => Security (notif. user prefs)
// and also
User Profile UI => Security (other user prefs)

The question I ask myself is: _imagine the case where a user has not yet customised their notification preferences. How will the User Profile UI know which notification types (and channels) are there, to present the different checkboxes?_ So even if we don't provide the specific user preferences that come from profile info, I believe we must at least provide a list of the existing notification types and channels.

We could hardcode it somewhere in an Enumeration, or as @pgayvallet suggests, I suppose we can allow consumers to register new notification types and store them ?in memory? (how does it work for EBT?). The later seems a lot more flexible, but it's probably harder to implement. Don't know what's the best strategy from an MVP standpoint.

That is unless the user preferences itself is backed by some sort of saved object that allows storing meta information about the preferences that exist. @arisonl ?


@pgayvallet thanks for the insight in RTF/i18n problematic. The API surface depends on it:

Anyway, these approaches are not mutually exclusive, we could start with a service method for A) for the MVP, and then add new methods to the service for B) and C) at a later phase.


QUESTION: Do we want to support using different connectors for each notification type?

Maybe I don't understand, but if we don't, we're bound to have a single connector for all and every notification types, don't we?

Sorry, the question is not very clear. I was imagining a scenario where, for instance, an organisation had a couple different Slack workspaces, and they wanted to send notifications to users that are in either of these spaces. In that line, my question was whether it would be interesting to support configurations such as:

caseAssignmentNotificationConnectors: ['emailConnector', 'slackConnector1', 'slackConnector2']

So in fact, the question I was asking myself was more: Configuration-wise, for a given notification type (e.g. case assignment), should we support more than one connector of the same "type" (e.g. slack)? I imagine it wouldn't be too hard, and this way we are covered if presented with such scenario.

pgayvallet commented 2 years ago

There are multiple opinions regarding multi-person notifications. Initially, I added it cause I saw it as a feature that could be interesting from a functional standpoint it would increase the complexity to add a Slack connector in the future, if we are to honour the flag

I think we're not speaking of the same thing. As I said previously, I wasn't referring to notifications 'shared' between users (e.g a multi-users message in slack), but the capability to send notifications to a list / group of different users at the same time and using a single notification API call.

Now, if that's not required for the MVP, great.

User Profile UI => Notifications Service => Security (notif. user prefs)

I think the problem remains here: what if the security plugin wants to use the notification service? It adds a cycle here.

Note that the problem may not be in the current issue's proposed design, but in the fact that the security plugin is currently a single block doing too many things (similar to the cloud plugin for instance).

So in fact, the question I was asking myself was more: Configuration-wise, for a given notification type (e.g. case assignment), should we support more than one connector of the same "type" (e.g. slack)? I imagine it wouldn't be too hard, and this way we are covered if presented with such scenario.

Thanks, it's clearer now.

Then I would say it looks fairly easy to add later in a non-breaking way (change config structure to allow both a string or an array of string), in which case, if not required for the MVP, we can probably ignore it for now?

Dosant commented 2 years ago

the requirement for MVP is to be able to send plain text with a backlink to Kibana.

Who will generate the backlink to Kibana? Will this be done automatically inside of the service or does it have to be done by consumers who will have to append the link to the message string?

cnasikas commented 2 years ago

I think it should be the responsibility of the consumers because they want to link to different pages in Kibana. For example in Cases, we would like to add a backlink to the case the user got assigned.

gsoldevila commented 2 years ago

The idea was to keep the API as simple as possible, so the accepted message will be plain text. Thus, the API will not check whether text contains a link. Then, this also means that we will rely on the fact that most email clients (Gmail, iOS, ...) perform some processing and make the links clickable.

vadimkibana commented 2 years ago

As discussed offline today, and I've also spoken with few people about Alerting Connectors today, would like to propose the following:

We would create a notifications plugin in X-pack (which will later host the Notifications Center), with an email service, which has a .send() method.

plugins.notifications.email.send({
  to: 'abc@gmail.com',
  textBody: 'You were tagged in a case ...',
  // more fields ...
});

From the current technical limitations, it is best for now to use Alerting email Connectors, especially the pre-configured cloud email connector, and Alerting Action, which is executed immediately (without putting it into the Task Manager queue).

Would be great if we could just provide the email.send() method to Cases, without adding concepts of "notification" and "connector", to the "notifications" plugin for now.

Like above caseAssignmentNotificationConnectors, registerNotificationType, etc...

As it is not clear if the Notifications Center will have a concept of "connectors", and "notifications" in the Notifications Center will be way more complex persistent entities.

IMO, would be great for now if Cases could just use the plugins.notifications.email.send() method directly, which means they would need to get hold of the user's profile (I assume that already should be the case), from which email and locale (if we are storing it) could be used to localize and send the email.

// Once case was assigned
const {email, fullName, locale} = await plugins.users.loadProfile(assigneeId);
await plugins.notifications.email.send({
  to: email,
  textBody: localizeNotificationMessage(locale, fullName, caseId),
});

In the future, we could provide the email sending as some fundamental service from Core, then notifications email service could just be thought of as a temporary proxy for that:

plugins.notifications.email.send = (params) => core.email.send(params);
lukeelmers commented 2 years ago

would be great for now if Cases could just use the plugins.notifications.email.send() method directly, which means they would need to get hold of the user's profile (I assume that already should be the case), from which email and locale (if we are storing it) could be used to localize and send the email.

No strong feelings from me one way or the other, but this approach makes sense to me. I'm +1 on anything we can do to avoid tying this API too closely to concepts that may change later. Connectors is the obvious one, but I suppose you could say the same for Notifications. Also, IIRC we are not considering localization part of the MVP, so that wouldn't be necessary for now.

@cnasikas what do you think of this proposal?

In the future, we could provide the email sending as some fundamental service from Core,

TBH I'm not sold on the idea of email being a core service, especially if we imagine a SaaS future where we could potentially have a standalone service for something like this. But that's outside the scope of this discussion, and not something we need to reach alignment on now.

cnasikas commented 2 years ago

I also favor not tying closely concepts that may change later. Cases still need to bulk get the user profiles to get the email assignees which is ok.

which is executed immediately (without putting it into the Task Manager queue).

Regarding executing immediately, I believe we should not go with this approach and use the task manager. Cases' endpoints should not be blocked to send an email. If the notification service does not use the task manager, consumers (Cases) will have to use the task manager leading to multiple implementations of the same task. Lastly, If in the future the notification service decides to use the task manager, Cases need to stop using the task manager to avoid creating two jobs: one for Cases and one for the notification service. In the long run, it will be better if the notification service uses the task manager and have control of how notifications are being sent.

We would create a notifications plugin in X-pack (which will later host the Notifications Center), with an email service, which has a .send() method.

A case can have multiple assignees. I think we should take that into consideration and let the send method accept an array of emails and send one email to each assignee (or bulk send emails or similar).

kobelb commented 2 years ago

Regarding executing immediately, I believe we should not go with this approach and use the task manager. Cases' endpoints should not be blocked to send an email. If the notification service does not use the task manager, consumers (Cases) will have to use the task manager leading to multiple implementations of the same task. Lastly, If in the future the notification service decides to use the task manager, Cases need to stop using the task manager to avoid creating two jobs: one for Cases and one for the notification service. In the long run, it will be better if the notification service uses the task manager and have control of how notifications are being sent.

Agreed. The other benefit we get from this approach is that we will transparently retry sending these emails when there's a transient error once we start supporting at-least-once action runs.

gsoldevila commented 2 years ago

I also favour not tying closely concepts that may change later. Cases still need to bulk get the user profiles to get the email assignees which is ok.

That's a good point. For the time being the name of the new plugin is pretty much the only mention we have about 'notifications'. It exposes an EmailService that allows sending mails, plain and simple, the bare minimum to fulfil MVP requirements.

Regarding executing immediately, I believe we should not go with this approach and use the task manager.

Sorry if my description was confusing, I was talking about a send() method, but I didn't want to imply the email would be sent straight away. The idea is that we're going to call a bulkEnqueueExecution() method, containing a list of all the executions that need to be run by task manager. If I understood correctly, this will create a single SO, that will then be read by task manager, and deleted once the corresponding emails have been sent.

A case can have multiple assignees. I think we should take that into consideration and let the send method accept an array of emails and send one email to each assignee (or bulk send emails or similar).

Indeed, the API is going to accept a list of recipients, and it will schedule a single bulk action with the list of all the executions.

cnasikas commented 2 years ago

Thank you, @gsoldevila, and sorry for the confusion!

vadimkibana commented 2 years ago

@cnasikas @kobelb @gsoldevila I was hoping we can execute an action immediately, using the execute() method. (I assume that execute() does not create a saved object.) So we can avoid creating an unnecessary saved object and adding it to the Task Manager queue.

Or am I missing something? Do we gain something by scheduling a task in the Task Manager? Or does it have to do with UI Kibana and Background Task Kibana split, where UI Kibana does not know how to execute an Action?

[..] Cases' endpoints should not be blocked to send an email. [..]

@cnasikas What do you mean here exactly? The execution should still be async, hence can run in the background:

async sendNotifications() {
  plugins.notifications.email.sendPlainTextEmail(opts).catch(() => {}); // No "await" - executes in background.
}

[..] If the notification service does not use the task manager, consumers (Cases) will have to use the task manager leading to multiple implementations of the same task. Lastly, If in the future the notification service decides to use the task manager, Cases need to stop using the task manager to avoid creating two jobs: one for Cases and one for the notification service. [..]

Curious, why should any of them use the Task Manager? Ideally, the current abstraction should "just send an email" and there should be no assumptions from Cases of how it is implemented under-the-hood.

[..] In the long run, it will be better if the notification service uses the task manager and have control of how notifications are being sent.

In the long run, it is not even clear if the Notifications Service/Center will be part of Kibana, maybe it will be a standalone cloud service. Hence, I'm curious, why do you think it should use the Task Manager?

Agreed. The other benefit we get from this approach is that we will transparently retry sending these emails when there's a transient error once we start supporting at-least-once action runs.

Currently, as far as I understand, we are not re-sending. And we should not optimize for that, as Notifications Center might not even be implemented in Kibana, but as a standalone service. And Email Service might not use Connectors in the future.

kobelb commented 2 years ago

@cnasikas @kobelb @gsoldevila I was hoping we can execute an action immediately, using the execute() method. (I assume that execute() does not create a saved object.) So we can avoid creating an unnecessary saved object and adding it to the Task Manager queue.

Or am I missing something? Do we gain something by scheduling a task in the Task Manager? Or does it have to do with UI Kibana and Background Task Kibana split, where UI Kibana does not know how to execute an Action?

Currently, we get a few concrete benefits from scheduling a task in task-manager:

  1. It's asynchronously ran - this allow us to make the UI more responsive because it doesn't need to wait for the email to complete sending.
  2. Transparent retries - if there's a transient error sending the email, it will transparently retry sending the email. This will all be done asynchronously, and the user won't want to wait.

It's also aligned with our future long-term vision and allows us to make the email sending be performed by the Kibana process that is only responsible for running background-tasks.

kobelb commented 2 years ago

@cnasikas What do you mean here exactly? The execution should still be async, hence can run in the background:

async sendNotifications() { plugins.notifications.email.sendPlainTextEmail(opts).catch(() => {}); // No "await" - executes in background. }

This will result in emails not being sent if the Kibana node crashes at the wrong time. We should NOT do this.

vadimkibana commented 2 years ago

@kobelb

It's asynchronously ran - this allow us to make the UI more responsive because it doesn't need to wait for the email to complete sending.

It will need to wait for saved object to be stored. My example above shows how to make it truly async.

In any case, it is up the Cases team how much async they want it. Whether they want to block the user until the saved object is stored or not.

For the notifications plugin, the important part to understand is if we want to create the saved objects at all.

Transparent retries - if there's a transient error sending the email, it will transparently retry sending the email.

What kind of errors will it try to overcome? I assume those would be connector specific errors, like if it cannot execute an action at all for some reason? Is there a way to specify on which kind of errors it will re-try, so we don't end up sending multiple emails for the same notification?

I assume it will not handle email specific semantics, like debounced emails?

It's also aligned with our future long-term vision and allows us to make the email sending be performed by the Kibana process that is only responsible for running background-tasks.

This a good point. Essentially what I was asking above:

Or does it have to do with UI Kibana and Background Task Kibana split, where UI Kibana does not know how to execute an Action?

I guess the immediate execute() method will be deprecated on UI Kibanas in the fututre?

Which means, we have to create saved objects, otherwise it will be refactored to that in the future anyways.

kobelb commented 2 years ago

It will need to wait for saved object to be stored. My example above shows how to make it truly async.

I feel like I'm missing something. Are we really so concerned about the additional time that it will take to persist a single saved object that we're willing to risk these emails not being sent?

This approach also means that any errors that are thrown when sending the email will be swallowed. The user will never know if their email didn't send.

In any case, it is up the Cases team how much async they want it. Whether they want to block the user until the saved object is stored or not.

For the notifications plugin, the important part to understand is if we want to create the saved objects at all.

In all other situations where we send emails, we've decided to persist a saved-object to increase the likelihood that the emails are sent. Is there something special about this situation that makes you think we should change our approach?

What kind of errors will it try to overcome? I assume those would be connector specific errors, like if it cannot execute an action at all for some reason? Is there a way to specify on which kind of errors it will re-try, so we don't end up sending multiple emails for the same notification? I assume it will not handle email specific semantics, like debounced emails?

At the moment, the email connector is rather naive about retries, and it will retry on any error that is thrown when sending the email. We can definitely improve on this in the future. It is possible that we will end up sending multiple emails for the same notification.

I guess the immediate execute() method will be deprecated on UI Kibanas in the fututre?

We aren't certain about the future of execute(). There are some valid usages at the moment, for example we allow people to manually synchronize a case to a third-party service and we want to display the result in the UI. At a minimum, we do know that in the future we will be minimizing the usage to a bare minimum.

vadimkibana commented 2 years ago

@kobelb

I feel like I'm missing something. Are we really so concerned about the additional time that it will take to persist a single saved object that we're willing to risk these emails not being sent?

This approach also means that any errors that are thrown when sending the email will be swallowed. The user will never know if their email didn't send.

I don't have an opinion here, if Cases team wants to block while waiting on saved object save, fine with me. (I guess people mean different things by async.)

From what I understand, we need a saved object in any case, as that is the mechanism how Task Manager will communicate from UI Kibana to Background Task Kibana.

Regarding email retries:

  1. Is it possible to configure it such that it retries to resolve only transient errors, which will not lead to more than one email for the same notification?
  2. Will it handle email protocol errors in the future (maybe already now), like debounced emails?

In all other situations where we send emails ...

Curious, what are the other places in Kibana where we send emails?

kobelb commented 2 years ago

Is it possible to configure it such that it retries to resolve only transient errors, which will not lead to more than one email for the same notification?

Anything is possible with enough time and developer effort.

Will it handle email protocol errors in the future (maybe already now), like debounced emails?

It definitely can.

Curious, what are the other places in Kibana where we send emails?

Alerting rules send emails when various conditions are hit.