elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.57k stars 8.09k forks source link

[RCA] Define new "investigation" CRUD API #187284

Closed jasonrhodes closed 1 week ago

jasonrhodes commented 1 month ago

Acceptance Criteria

jasonrhodes commented 1 month ago

@maryam-saeidi I just synced with @kdelemme and @benakansara and they've got some good context now from the other POC, so they're going to jump in on these investigation UI side tickets. Feel free to continue to be involved as we refine these (asking questions, syncing between the entry point flow and this flow). Thanks all.

kdelemme commented 1 month ago

Design: https://www.figma.com/design/YJPufJ9KJjvBY9pGvNDRSR/RCA-workflows?node-id=0-1&t=IgH5U7bAwPV2PxFS-1

First attempt at providing an overview of what needs to be done in order to bring the aforementioned design to life. We should focus only on the scenario of log alerts with kubernetes data.

Design elements

We have a date range picker that seems to be global to all the widget present on the page. The investigation is linked to an alert, and maybe to a rule as well so we can link this investigation to similar alerts. We have different widgets, like charts (which one?), recent events (need to define what events, and how we find them). We can add hypotheses which can be text and/or image.

Recent Events

We need to find for every event shown in the design, how we get the data. it can be using an existing API if one exists, or a query into the index used by the rule.

Model

Investigation: {
 id: uuid;
 title: string;
 createdAt: Date;
 createdBy: user;
 tags: string[];
 status: "ongoing" | "closed";
 relatedRuleId: string;
 relatedAlertId: string;
 widgets: Widget[]
 links: Link[]
 hypotheses: Hypothesis[]
}

Widget: {
 id: uuid;
 title: string;
 type: "esql" | "embeddable" | "recentEvents" | "chart";
 parameters: any;
 layout: Layout;
}

Link: {
 text: string;
 link: string;
}

Hypothesis: {
 id: uuid;
 text: string;
 attachments: Attachment[]
 createdAt: Date;
 createdBy: user;
}

Attachment: {
 id: uuid;
 type: "image";
 url: string;
}

API

GET /investigations

POST /investigations

GET /investigations/:id

POST /investigations/:id/widgets
PUT /investigations/:id/widgets/:widgetId
DELETE /investigations/:id/widgets/:widgetId

POST /investigations/:id/hypotheses
PUT /investigations/:id/hypotheses/:hypothesisId
DELETE /investigations/:id/hypotheses/:hypothesisId

Kibana plugins

Because of cyclic dependencies, Dario's initial POC used two plugins, one responsible for the registry of widgets: other plugins would depend upon it to register their widgets, e.g. APM, SLO, Synthetics, etc...

And the main plugin depending on the registry one, and on the other plugins like SLO, APM (e.g. for the clients), and containing the investigation UI and API.

Concerns

As we focus on one particular alert, we make this investigation UI a fixed set of elements or simply a different alert details page. We need to keep in mind that the user knows better and should construct the investigation block as they want.

mgiota commented 1 month ago

@kdelemme Great points! In the Model section I suggest we add the concept of User, Escalation/Integration as well.

User: {
  id: uuid;
  username: string;
  password: string
}

# example Jira, Github issue etc
Integration: {
  id: uuid;
  title: string;
  description: string;
}

The Investigation model needs to be adapted as well to include the list Integrations.

Since there is the concept of Escalation and inviting more users to the investigation, we should add the list of invited users as well.

Investigation: {
 id: uuid;
 title: string;
 createdAt: Date;
 createdBy: user;
 invitedUsers: User[];
 tags: string[];
 status: "ongoing" | "closed";
 relatedRuleId: string;
 relatedAlertId: string;
 widgets: Widget[];
 links: Link[];
 hypotheses: Hypothesis[];
 integrations: Integration[]
}
mgiota commented 1 month ago

@kdelemme Regarding status field of the Investigation, in the design there is acknowledged. So let's use "acknowledged" | "closed";

mgiota commented 1 month ago

we can link this investigation to similar alerts

In the design there is the concept of Related investigations. What do we consider similar alerts? Alerts that are linked to the same rule type? We need to define what relevant investigations are.

maryam-saeidi commented 1 month ago

I feel like we are doing the same thing as cases with additional components like widgets and hypotheses 🙈

Putting that aside and only focusing on the proposal, do we also need to keep a field related to the latest update? (like updatedAt, updatedBy, for Investigation) I assume hypotheses are not editable in this model, right?

@mgiota For integrations, how is it different from the Link that Kevin mentioned?

chrisdistasio commented 1 month ago

New version release: TBD Node resource failure: TBD Container failure start: TBD Latency increase: TBD Error rate increase: TBD Log rate increase: TBD Elasticsearch upgrade: TBD

Is the above a full set of events that need to be captured?

jasonrhodes commented 1 month ago

New version release: TBD Node resource failure: TBD Container failure start: TBD Latency increase: TBD Error rate increase: TBD Log rate increase: TBD Elasticsearch upgrade: TBD

Is the above a full set of events that need to be captured?

Just the ones that appear in the design. As the current approach requires us to manually extract each event using specific logic, we'll need to understand what the intended universe of events is, to start. A question for @drewpost I think.

chrisdistasio commented 1 month ago

thanks, @jasonrhodes. Do you have a sense of how you will capture and compute some of these? Do you expect to have these attached to an entity?

jasonrhodes commented 3 weeks ago

Do you have a sense of how you will capture and compute some of these? Do you expect to have these attached to an entity?

I don't have good ideas yet. If they were available via the entity system, that would be great, but I don't want to block this work on that one so we will look at alternative ways of computing some of these, as well, and will have to punt on the ones that aren't possible (at least until entities are available).

michaelolo24 commented 2 weeks ago

Hey all! Just wanted to drop a heads up that security is also going to have a concept of investigations that at this moment, will only serves as a navigation item within security, but will most likely expand to be more in the future. Given that, it would be great to have any api's/so's scoped to observability if possible i.e. /api/observability/investigation or /api/obs-investigation to prevent any collisions in the future.

Is there additional documentation on this feature that we may be able to read up on?

jasonrhodes commented 2 weeks ago

Ping @drewpost re: this security overlap in the "investigations" concept ^^

@michaelolo24, who would be the product person for Drew to sync up with here?

michaelolo24 commented 2 weeks ago

@jasonrhodes => @paulewing would be the person for him to speak with, thanks!

jasonrhodes commented 1 week ago

Closed by https://github.com/elastic/kibana/pull/190094