Open banderror opened 3 years ago
Pinging @elastic/security-solution (Team: SecuritySolution)
Pinging @elastic/security-detections-response (Team:Detections and Resp)
We had a chat with @marshallmain last week and here's a summary of what we discussed and how decided to proceed with the consolidated implementation:
EventLogService
(https://github.com/elastic/kibana/pull/98353), but many implementation details (especially around index bootstrapping) taken from RuleDataService
/RuleDataClient
(https://github.com/elastic/kibana/pull/98935).FieldMap
. We'd leave FieldMap
in the code, but remove it from generic parameters of classes and functions, making it a helper utility, rather than a hard dependency of the consolidated implementation.I opened a draft PR (https://github.com/elastic/kibana/pull/101453) for the consolidated implementation. Here's a comparison of the 3 implementations in rule_registry
:
Feature | RuleDataService | EventLogService | Consolidated implementation |
---|---|---|---|
Schema | Implicit, schema-less (indexing function accepts documents as any ) |
Explicit schema for documents | Explicit schema for documents |
Dependency on FieldMap |
No hard dependency | Hard dependency | No hard dependency |
Component templates | Yes | No | Yes |
Versioning of schema, mappings, settings | No | No | Yes |
Soft document migrations (index rollover) | No | No | Yes |
Index management API | Imperative | Declarative | Declarative |
Index management encapsulation | Low-level methods are exposed to clients, clients are responsible for correct implementation of index management | Low-level methods are encapsulated, machanism is responsible for index management | Low-level methods are encapsulated, machanism is responsible for index management |
cc @tsg @spong @xcrzx @marshallmain @jasonrhodes @dgieselaar @smith @gmmorris
One more thing I forgot to mention is about component templates. In the consolidated implementation, I'm proposing the following "architecture" for templates (as you can see here and here in the code):
During index bootstrapping a number of templates will be created by the Event Log mechanism. They will have a certain order of precedence and the "next" template will override properties from all the "previous" ones. Here's the list of templates from "start" (most generic, least precedence) to "finish" (most specific, most precedence):
.alerts-mappings
component template. Specified internally by the Event Log mechanism. Contains index mappings common to all logs (observability alerts, security execution events, etc)..alerts-settings
component template. Specified internally by the Event Log mechanism. Contains index settings which make sense to all logs by default..alerts-{log.name}-app
application-defined component template. Specified and versioned externally by the application (plugin) which defines the log. Contains index mappings and/or settings specific to this particular log. This is the place where you as application developer can override or extend the default framework mappings and settings..alerts-{log.name}-user
user-defined component template. Specified internally by the Event Log mechanism, is empty, not versioned. By updating it, the user can override default mappings and settings..alerts-{log.name}-user-{spaceId}
user-defined space-aware component template. Specified internally by the Event Log mechanism, is empty, not versioned. By updating it, the user can override default mappings and settings of the log in a certain Kibana space..alerts-{log.name}-{spaceId}-template
index template. Its version and most of its options can be specified externally by the application (plugin) which defines the log. This is the place where you as application developer can override user settings. However, mind that the Event Log mechanism has the last word and injects some hard defaults into the final index template to make sure it works as it should.Template #6
overrides #5
, which overrides #4
, which overrides #3
, etc. More on composing multiple templates in the docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-template.html#multiple-component-templates
Mechanism-level .alerts-mappings
and .alerts-settings
component templates are managed purely by the mechanism, bootstrapped on rule_registry
plugin start, common to all logs defined via the consolidated implementation, and it's not possible for a client (application/plugin) to change them. It's only 2 of them in the cluster.
All the other log-level templates (3 component templates and 1 index template) are created per each log defined via the consolidated implementation.
As a developer defining a log (and thus its mappings and settings), you are able to specify templates #3
(application-defined component template) and optionally #6
(index template).
@marshallmain @madirey @peluja1012 @xcrzx @ecezalp @spong here's my 2 cents on our consolidation efforts and https://github.com/elastic/kibana/pull/101453 vs https://github.com/elastic/kibana/pull/102586
This is all just my subjective opinion, my understanding might be incomplete, and I don't have any strong opinions on anything below.
Pros:
io-ts
. This gives us 2 things: 1) static compiler checks and 2) ability to implement validation-on-write and/or normalization-on-read, if we wish to add it. Not a critical thing, but very nice to have, and as we discussed (and as far as I understood), we will likely want to have this in the final implementation.Cons:
RuleDataService
would need to be updated and migrated to the implementation based on EventLogService
. This should not be a huge deal, but considering the fact that we wanted to have a consolidated implementation by 7.14 FF, this is now a big risk.Pros:
RuleDataService
. Even without explicit schema and everything else, this should be enough to use this as a consolidated implementation. So we could move forward sooner.Cons:
EventLogService
: no schema, no encapsulation (exposing low-level methods for index bootstrapping), imho too much flexibility and so on.I really liked the implementation in https://github.com/elastic/kibana/pull/102586 which puts a version in the _meta
of each template => which gets propagated to the final index template => which gets propagated to the index:
_meta: {
"versions": {
".alerts-security-solution-index-template": 1,
".alerts-security-solution-mappings": 1,
".alerts-ecs-mappings": 1,
".alerts-technical-mappings": 1
}
}
I also like the idea of lazy index creation. FWIW, in https://github.com/elastic/kibana/pull/101453 index bootstrapping is also done lazily. The difference is in https://github.com/elastic/kibana/pull/102586 more parts are shared between spaces, and so 2/3 of the bootstrapping happens outside of RuleDataClient
(client of a concrete log) and 1/3 inside (index rollover or creation). Whereas in https://github.com/elastic/kibana/pull/101453 most of the index bootstrapping is done by the client of a concrete log.
I'm wondering however what do you think about standardizing templates similar to how I did it in the draft (see https://github.com/elastic/kibana/issues/101016#issuecomment-857593741).
Let me copy-paste this to here
Rationale and proposed UI for the {namespace} portion The {namespace} portion of the index naming strategy is meant to provide flexibility for our users to segment the Alerts data in different ways, similar to how the data stream naming strategy has a {namespace} that can be used for different use cases. For example, users might want different ILM policies for the alerts created by some rules compared to others.
We generally don’t want to make a direct connection between this concept of {namespace} and Kibana spaces. Users can choose to include the Kibana space id in the {namespace}, but they might also choose some other means to segment the data.
Therefore, we’d like the {namespace} to be configurable on a per-rule basis via a simple text input widget. The value of the {namespace} is saved in the Rule object. In addition, there will be a Kibana advanced setting per space that defines the default value of this input widget.
In other words, users that would like to separate alerts in different indices by using the space IDs can configure the value of the Kibana advance setting to be equal to the space ID. When creating a new rule in that space, the {namespace} option will be pre-filled with the space ID. They can still change it on every rule if they wish to.
Note about multi-space future: This should continue to work the same way even when multi-space rules become available. The namespace value will be stored on each rule, so it will continue to be tied to the space in which it was created. A user who shares that rule to a 2nd space, for example, would need to decide for themselves the best namespace value in that scenario. A warning could be shown at Rule sharing time indicating the index in which the alerts will be written and giving the user the opportunity to change it. Newly created rules would continue to read in the “default value” from the advanced settings in the current Kibana Space where the rule is being created from.
I think this agreement does not have a lot of effect on any of the two approaches. https://github.com/elastic/kibana/pull/101453 API does not force a developer to bind a log to a Kibana space. It provides a method for resolving a log which accepts a space id in its parameters:
export interface IEventLogResolver {
resolve<TMap extends FieldMap>(
definition: IEventLogDefinition<TMap>,
spaceId: string
): Promise<IEventLog<Event<TMap>>>;
}
For its implementation, spaceId
is just a string, and we could just rename it to namespace
and pass this value from parameters of a rule instance.
I easily could be just biased towards https://github.com/elastic/kibana/pull/101453, but I feel like it provides a better API in general (althought it can be not perfect, missing certain methods etc).
But given the time left before 7.14 FF and overall fatigue from working on alerts-as-data/event-log/RAC stuff, I think as of now we should proceed with https://github.com/elastic/kibana/pull/102586, because "progress, simple perfection".
We need to ship something and move forward, and with https://github.com/elastic/kibana/pull/102586 theres much more chances to finalize this implementation sooner.
Thanks for summarizing Georgii! I'm copy pasting the note I sent to Georgii on slack here to summarize my perspective, developed over the past 2 weeks.
At a very high level, there are 2 primary goals I have in mind for the eventual converged Alert Data client.
I spent 2 days or so comparing the RuleDataClient with the draft consolidated EventLogService to build a mental model of each implementation, assess the existing features, and start estimating the difficulty of adding on the additional required features to each implementation. After those 2 days, I felt very confident in my mental model of the RuleDataClient, but I was still struggling to remember how the abstraction layers fit together in the EventLogService. I could follow the call stack when looking at a specific piece, but the number of calls into various classes meant I would quickly forget the details of what each class did and stored. As for features, your comment does an excellent job summing up the all the key differences we were aware of at the time. When I started digging in to the implementations, I realized that additionally the namespacing strategy differed between them. The EventLogService creates templates and indices at the same time and creates a separate template for each space, whereas the RuleDataClient does not create concrete indices until documents are written and the template creation process is decoupled from the index creation. This positions the RuleDataClient well to support the arbitrary namespaces that users can define for their RAC alerts since a single set of templates already serves all namespaces for a particular solution's alert indices. Due to the requirement for arbitrary namespaces, it looked to me like the EventLogService would have to be updated to decouple the template creation from index creation - in the process breaking the existing template/index versioning and rollover logic in the EventLogService.
So overall, there appeared to be 2 paths forward:
Since (1) there are no blockers preventing the RuleDataClient from being used as the basis for Security Solution rules in RAC - any enhancements are great but not required for shipping so we can more easily make enhancements in parallel with migrating rules, (2) the RuleDataClient is in use already by Observability, and (3) I was more confident in my understanding of the RuleDataClient implementation and thus felt that other devs new to the consolidated client would have an easier time working on it, I decided to try adding the versioning and rollover logic to the RuleDataClient.
Summarizing next steps after meeting with @banderror and @xcrzx: Priorities:
.alerts-*
indicesgetWriter
, a single writer should be used for the duration of a rule executor. getWriter
should not be called repeatedly. We should make this clear in documentation, and also rename getWriter
to something that sounds more appropriate like initializeWriter
- this has a better chance of letting devs know that initializeWriter
is not something you want to call repeatedly.Details:
Promise<RuleDataClient>
for that log. Rule executors can then await
this promise to ensure that they don't write to the log until it is properly bootstrapped. This removes the need for solution plugins to define a "ready signal" to pass into the RuleDataClient that signals when the templates are created, instead the rule registry handles all the synchronization.
RuleDataPluginService.createOrUpdateComponentTemplate
and RuleDataPluginService.createOrUpdateIndexTemplate
. A schema for the log can be passed in as an optional parameter alongside the log definition to get static typing for the documents written by the RuleDataClient writer.RuleDataPluginService.bootstrapSolutionTemplates(templates): Promise<RuleDataClient>
instead of imperatively creating the templates and later directly instantiating a RuleDataClient..alerts
indices. The rule registry may provide other component templates that solutions can share for ease of use, e.g. the full ECS component template.This approach should adopt the declarative style from the event_log. The main difference is it aims to be a lighter-weight implementation that encapsulates template bootstrapping details while providing some flexibility in log definitions. Rather than caching objects for repeated retrieval from the service, the RuleDataPluginService will simply create templates and return a Promise<RuleDataClient>
. Solution plugins will be responsible for passing the RuleDataClient around to the appropriate places where it is needed.
I'd like to just add my 2 cents and try to provide a more high-level overview of our recent discussions with @marshallmain and @xcrzx.
We had a chat on the following topics:
TypeScript schema based on io-ts
io-ts
types vs dev-time code generation vs runtime conversionFieldMap
vs mappings from component and index templatesComponent and index templates
Index bootstrapping logic
Misc
RuleDataClient
instance. What should be bound to a namespace - RuleDataClient
or writer and reader?RuleDataClient
instances@marshallmain perfectly described all the details, I just want to again list what we're going to do in the next PR in a more high-level way:
RuleDataClient
. Schema should be optional. Dev should be able to define it manually or convert a FieldMap
to it. This is basically what was already implemented in https://github.com/elastic/kibana/pull/101453
security.events
carrying something that makes sense for security rule execution log), but the implementation must include references to the common templates (e.g. with "technical fields") under the hood.
RuleDataService
should provide a method that would implement bootstrapping; it would accept a log definition object and return a RuleDataClient
. Invariants should be enforced by this logic, race conditions and errors handled correctly.Again, for more details please refer to https://github.com/elastic/kibana/issues/101016#issuecomment-867830884.
@jasonrhodes @dgieselaar @smith please review ^^^^ and give us some feedback 🙂 Let me know if we should elaborate more on anything. The plan is to start working on the next PR soon, would be great to unblock https://github.com/elastic/kibana/pull/102586 sooner rather than later because it brings changes to the public API of RuleDataService
and RuleDataClient
.
The plan makes sense to me. I'm absolutely in favor of a declarative asset/template API which takes care of the "technical fields". Automatic type derivation sounds nice but a bit of duplication (mapping and io-ts type, for example) might be acceptable to unblock the follow-up efforts sooner IMO.
As discussed with @banderror, here are some weak spots in the current implementation that we could address in the consolidated implementation.
Currently, index name creation and update logic is split into two pieces.
rule_registry/server/rule_data_client/index.ts
contains index name creation logic:
function createWriteTargetIfNeeded() {
// ...
const concreteIndexName = `${alias}-000001`;
// ...
}
rule_registry/server/rule_data_plugin_service/utils.ts
contains index name update logic:
function incrementIndexName(oldIndex: string) {
const baseIndexString = oldIndex.slice(0, -6);
const newIndexNumber = Number(oldIndex.slice(-6)) + 1;
if (isNaN(newIndexNumber)) {
return undefined;
}
return baseIndexString + String(newIndexNumber).padStart(6, '0');
}
incrementIndexName
accepts index names created by createWriteTargetIfNeeded
and returns undefined
if the name doesn't match the pattern. However, the relationship between the two functions is not explicit. And it could lead to bugs if they become unsynchronized. A more robust option would be to have a single function that appends -000001
if the name doesn't contain it or increments the number otherwise.
Also, it should be flexible and not rely on 6 last digits, because there can be edge cases in general (https://github.com/elastic/kibana/pull/102586#issuecomment-875967599).
Index mapping update logic has several white spots. Add my concerns as comments to the updateAliasWriteIndexMapping
method.
function updateAliasWriteIndexMapping({ index, alias }: { index: string; alias: string }) {
const clusterClient = await this.getClusterClient();
const simulatedIndexMapping = await clusterClient.indices.simulateIndexTemplate({
name: index,
});
const simulatedMapping = get(simulatedIndexMapping, ['body', 'template', 'mappings']);
try {
await clusterClient.indices.putMapping({
index,
body: simulatedMapping,
});
return;
} catch (err) {
if (err.meta?.body?.error?.type !== 'illegal_argument_exception') {
/*
* This part is unclear. Why do we skip the rollover if we've caught 'illegal_argument_exception'?
*/
this.options.logger.error(`Failed to PUT mapping for alias ${alias}: ${err.message}`);
return;
}
const newIndexName = incrementIndexName(index);
if (newIndexName == null) {
/*
* I think we should not fail here. If the index name had no "-000001" suffix, we should append it.
*/
this.options.logger.error(`Failed to increment write index name for alias: ${alias}`);
return;
}
try {
/*
* We don't check response here. It could return "{ acknowledged: false }".
* Should we consider a rollover to be successful in that case?
*/
await clusterClient.indices.rollover({
alias,
new_index: newIndexName,
});
} catch (e) {
if (e?.meta?.body?.error?.type !== 'resource_already_exists_exception') {
/*
* This part is also unclear. We log only 'resource_already_exists_exception' but silence all other exceptions.
* It looks incorrect.
*/
this.options.logger.error(`Failed to rollover index for alias ${alias}: ${e.message}`);
}
/*
* There could be many reasons for the first attempt to update mappings to fail (network, etc.).
* I think we should retry in case of failure.
*/
}
}
}
Currently, indices bootstrapping considered to be successful even if some requests failed. There is no way to programmatically retrieve the outcome of index template initialization as we only log errors to console. See initialisation in ecurity_solution/server/plugin.ts
for example:
const initializeRuleDataTemplatesPromise = initializeRuleDataTemplates().catch((err) => {
this.logger!.error(err);
});
It could lead to a situation when index bootstrapping fails, but RuleDataClient
doesn't know anything about it and writes documents into an index that doesn't have proper mappings. In the worst case, dynamic mappings would be applied to the indexed documents, leading to unexpected behavior, like broken aggregations in the application code.
A safer approach would be to disable all write operations if index bootstrap failed. This is probably should be done on the rule_registry
library level.
We should also add some retry logic to the bootstrapping logic.
I talked to Elasticsearch team, and they recommended to add retries to component and index template updates, or IMO wherever else we could get { acknowledged: false }
from ES during index bootstrapping.
Here's the log of our discussion:
Q: In Kibana (specifically, RAC project), we started to use the new template APIs for bootstrapping our indices (PR for reference https://github.com/elastic/kibana/pull/102586). This code is not well tested and not production ready, but it’s going to be used from Security and Observability solutions in the upcoming releases, so I’d like to make sure that we implemented it correctly. Specifically, we started to use:
PUT _index_template/...
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-template.htmlPUT _component_template/...
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-component-template.htmlWe make sure to wait for the responses from these endpoints before any other code will be able to write anything to the corresponding indices. So any writing happens only if these calls succeed.
In those responses, we normally get 200 OK with the body:
{ acknowledged: true }
I wasn’t able to find it in the docs or elsewhere, so I’d like to double-check two things.
The first thing is about acknowledged: true
:
acknowledged: false
in a 200 OK response?The second thing is about race conditions between index mgmt and indexing requests. Is it possible to get a race condition between successfully acknowledged template mgmt requests, and the following (potentially, immediately after that) attempt to write a document to an index which name matches the name of the updated index template?
POST /_index_template/_simulate API
); second, if this fails, we do a rollover of the alias.When developing locally, we’ve got into some weird broken state where the concrete index ended up with dynamic mappings instead of mappings from the index template. Most likely, this was caused by our incorrectly implemented code + constant server reloads during development. However, I decided to double-check if any race conditions are possible in general.
A:
The first thing is about acknowledged: true:
- The meaning - does it mean that the request is accepted, but work needs to be done asynchronously in the background?
That means that the change has been acknowledged by the master and accepted with a cluster state update
- Can we get acknowledged: false in a 200 OK response?
- What errors can we get there in general, can we refer to some spec with listed errors and response structure?
You can get "false" for the acknowledged, if, when publishing the new cluster state, the request times out before enough nodes have acknowledged and accepted the cluster state
there is not any listed errors, so generally a good rule to consider would be to retry in that case
- So basically, when creating initial index OR updating existing one OR rolling over alias, is it possible that the concrete index won’t get the most recent mappings and settings from the (just updated) templates?
if the template update response have "acknowledged: true", then retrieving those settings through something like the simulate API or creating a new index will use the just updated template
Q: If this question even makes sense, how many nodes is enough though? Does acknowledged: true guarantee that any subsequent write request will be handled by a node that received the most recent cluster state with the new mappings etc?
Just want to make sure there’s no way to get a race condition, leading to some weird state like let’s say a node which doesn’t know about a newly created index, processes write request and creates an index with dynamic mappings instead of mappings specified in the templates.
A: You shouldn't have to worry about this assuming the request is acknowledged: true, because that means the master node has acknowledged it, and all those requests (index creation, mappings updates) go through the master node
We identified a few potential issues with index bootstrapping and described them in the 2 previous comments.
I'm planning to address them as part of this consolidation task. I can split this up to several PRs, or do it in a single one, not sure at this point which way is better.
@jasonrhodes @weltenwort @marshallmain @xcrzx
Thanks for the writeups @xcrzx and @banderror ! Those sound like good improvements to make. The Q&A with answers from the ES team is especially useful, thanks Georgii for contacting them and writing up the answers!
This part is unclear. Why do we skip the rollover if we've caught 'illegal_argument_exception'?
It's inverted, we skip the rollover if we catch anything except for illegal_argument_exception
- that's the error returned by ES when the mapping update contains a conflicting field definition (e.g. a field changes types). We expect to get that error for some mapping changes we might make, and in those cases we want to continue on to rollover the index. Other errors are unexpected and should trigger a retry IMO.
We log only 'resource_already_exists_exception' but silence all other exceptions.
This is inverted as well, we silence resource_already_exists_exception
and log all others.
I think we should not fail here. If the index name had no "-000001" suffix, we should append it.
I'm a little worried about appending to the index name in the case of failures without knowing what caused the failure. Appending might fix the case where the suffix gets dropped somehow, but it could also introduce other edge cases. If the suffix gets dropped somehow during the rollover process and we fail to find the suffix on the next check and start with "-000001" again, then "-000001" could already exist. In that case it would appear that the rollover process has completed successfully when really nothing has happened.
I think the most robust approach would be to go back to using the dry_run
parameter on the rollover API to retrieve the next_index
name directly from Elasticsearch and avoid any attempts to replicate the rollover naming logic ourselves. We pay a slight performance cost by making an additional call to ES, but we're only doing this once at startup time so it seems worth it to minimize bug potential here.
We don't check response here. It could return "{ acknowledged: false }". Should we consider a rollover to be successful in that case?
This is a great point and something we should definitely handle. "{ acknowledged: false }" seems like a failure to me.
There could be many reasons for the first attempt to update mappings to fail (network, etc.). I think we should retry in case of failure.
Adding retry logic for the mapping update process in the case of (unexpected) errors would definitely be a good improvement.
A safer approach would be to disable all write operations if index bootstrap failed. This is probably should be done on the rule_registry library level.
Definitely agree with this, we should not allow writes if the templates are not installed correctly.
It's inverted, we skip the rollover if we catch anything except for
illegal_argument_exception
...
Sorry, my bad, didn't see that little !
. Thank you for the clarification. I'll add it as a comment to the code.
If the suffix gets dropped somehow during the rollover process and we fail to find the suffix on the next check and start with "-000001" again, then "-000001" could already exist.
The current implementation doesn't handle this case either. It fails when the index name doesn't contain a numerical suffix. So yeah, going back to using the dry_run
is probably the best option we have.
Adding @marshallmain's suggestions regarding index bootstrapping:
RuleDataClient
. We can start by associating all concrete indices with the default ILM policy created by the rule registry. However, we also need to add an additional index template per index alias in order to specify the rollover_alias
for the concrete index. (https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html#ilm-gs-alias-apply-policy) Without an ILM policy, the index will eventually become much larger than necessary and could hinder performance..siem-signals
naming scheme. To accomplish this, we'd add an additional parameter to RuleDataClient
that allows callers to specify an optional "secondary alias" that would be added to the concrete indices.A note regarding ILM policy, index template and namespace:
namespace
: .alerts-{registrationContext}.{datasetSuffix}-{namespace}
index.lifecycle.name
should be equal to the policy nameindex.lifecycle.rollover_alias
has to be the alias to rollover when the concrete index reaches the condition defined in the policyindex.lifecycle.rollover_alias
for every namespace - Elasticsearch doesn't automatically find the correct alias for us.Adding suggestions after syncing with Observability folks (cc @weltenwort, btw I don't know GitHub usernames of Kerry and Panagiota, so please feel free to tag them):
datasetSuffix
on the API level. In the index naming convention .alerts-{registrationContext}.{datasetSuffix}-{namespace}
, datasetSuffix
can only be alerts
or events
. It would make sense to restrict it in the RuleDataService
interface to make it safer and more explicit for developers.namespace
:
getWriter
set namespace = 'default'
by defaultgetReader
maybe set namespace = '*'
by defaultJust want to socialise here that something like https://github.com/elastic/kibana/issues/101541 (a higher-level abstraction on top of index management) is out of scope of this consolidation effort.
cc @Kerry350 @mgiota
@banderror I took the liberty to add one more ticket in your list for 7.15 (@jasonrhodes can you confirm this is for 7.15?) This is a new bug that we spotted while working on https://github.com/elastic/kibana/pull/110788.
@mgiota Thank you, I can only support that 👍 👍
Hey everyone, I removed this ticket from the backlog of the Detection Rules area. We (@elastic/security-detections-response-rules) are not the owners anymore (however feel free to still ping us if you have any tech questions about the ticket).
Ownership of this epic and its sub-tasks now goes to the Detection Alerts area (Team:Detection Alerts
label). Please ping @peluja1012 and @marshallmain if you have any questions.
Transferring again to @elastic/response-ops as they now own the rule registry implementation.
Summary
Consolidate https://github.com/elastic/kibana/pull/98353 and https://github.com/elastic/kibana/pull/98935 into a single implementation.
We ended up having 4 implementations of index management/writing/reading related/similar to the problem we're trying to solve in RAC: two in
rule_registry
(RuleDataService
andEventLogService
), one insecurity_solution
(for.siem-signals
index), one inevent_log
plugin. We should compare them, mind their strong and weak parts and build a consolidated implementation inrule_registry
.High-level plan:
RuleDataService
andRuleDataClient
. Enhance it, improve its API for better and safer DX, improve its implementation, especially the part responsible for index bootstrapping.EventLogService
) from rule_registry..siem-signals
implementation in Security once we migrate the Detection Engine torule_registry
.event_log
plugin for their monitoring needs. Also, we're planning to implement Rule Execution Log in Security based onevent_log
(https://github.com/elastic/kibana/issues/106347, https://github.com/elastic/kibana/issues/101013).Regarding robust index bootstrapping, consider this:
task_manager
to make sure bootstrapping procedure runs only once at a time; c) using some sort of distributed lock, and d) maybe something else I'm missing. Maybe we could check how Saved Objects service bootstraps.kibana
index.When the consolidated implementation is ready, make sure to update all references to it from plugins which already use it:
security_solution
,observability
,apm
,uptime
, etc.Tasks for 7.15
ruleDataClient.bulk()
- ✔️ @banderrorTasks for 7.16
Backlog
Indexing and index bootstrapping logic:
API enhancements for RuleDataService and RuleDataClient:
RuleDataClient
User-defined resources::
Misc:
Consider these as well: