elastic / package-spec

EPR package specifications
Other
17 stars 70 forks source link

Including SLO assets in Integrations. #435

Open agithomas opened 1 year ago

agithomas commented 1 year ago

The integration team has the technical know-how of supplying recommendations to the SLO systems. So, SLO configuration / recommendation can be shipped as an integration asset similar to dashboard assets.

Usage: When an integration is enabled, a set of recommendations for adding or updating SLOs appear to user.

Advantage :

a. Time saving for the user.

b. SLO metrics remain relevant when new metrics are introduced especially after a product upgrade

SLO configuration UI have suggestions or recommendations?

SLO configuration can also contain relationships to suggest recommendations.

Usecase : if cpu is considered, recommend to include memory. These dependency / recommendation mappings will be part of the SLO assets shipped with an integration

cc : @ruflin , @lalit-satapathy, @jsoriano

jsoriano commented 1 year ago

Transferred to package-spec to discuss about how this can be modelled in packages.

ruflin commented 1 year ago

Here some more details around SLO: https://github.com/elastic/kibana/issues/137323

@emma-raffenne @fkanout @kdelemme Would be great if someone of the team around SLO could chime in.

kdelemme commented 1 year ago

Adding @simianhacker

I'm not sure to fully understand what is being suggested / built here. But happy to schedule a call to discuss the current SLO work being done by the @elastic/actionable-observability team, and understand your need better.

kdelemme commented 1 year ago

Spoke with @ruflin earlier today, and integration team can use the SLO api to provide out of the box SLO for some integrations. Ideally a default SLO could be configured (and tweaked) during the integration installation.

When creating an SLO, the system takes care of creating the necessary resources and starts aggregating the data into a specific index. We don't support enabling or disabling an SLO. Therefore, if an SLO is created, the user has to delete it to stop data aggregation, or a manual call to the transform API could be used to stop it.

The SLO API is not GA yet :)

ruflin commented 1 year ago

As @kdelemme described, it seems the only missing bit is a enabled / disable feature, I think that would not only be useful for the Fleet scenarios but also for users that want to pause a SLO without having to learn about what transforms are etc.

The SLO API input has a neat structure. @kdelemme could you post 1-2 json examples? The reason is that I assume this is what would become part of the package spec and like this we can share it here with the team.

kdelemme commented 1 year ago

Regarding enabling and disabling an SLO, @vinaychandrasekhar what do you think?

@ruflin Here's some examples with different SLI and configuration options:

SLO Availability based on APM transaction error, using 30d rolling and occurrences method ``` curl --request POST \ --url http://localhost:5601/qca/api/observability/slos \ --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \ --header 'Content-Type: application/json' \ --header 'kbn-xsrf: oui' \ --data '{ "name": "My SLO Availability - rolling 30d", "description": "Some description", "indicator": { "type": "slo.apm.transaction_error_rate", "params": { "environment": "development", "service": "o11y-app", "transaction_type": "request", "transaction_name": "GET /flaky", "good_status_codes": ["2xx", "3xx", "4xx"] } }, "time_window": { "duration": "30d", "is_rolling": true }, "budgeting_method": "occurrences", "objective": { "target": 0.90 } }' ```
SLO Latency based on APM duration, using 7d rolling and timeslices method ``` curl --request POST \ --url http://localhost:5601/qca/api/observability/slos \ --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \ --header 'Content-Type: application/json' \ --header 'kbn-xsrf: oui' \ --data '{ "name": "My SLO Service Latency", "description": "My SLO Desc", "indicator": { "type": "slo.apm.transaction_duration", "params": { "environment": "development", "service": "o11y-app", "transaction_type": "request", "transaction_name": "GET /slow", "threshold.us": 5000000 } }, "time_window": { "duration": "7d", "is_rolling": true }, "budgeting_method": "timeslices", "objective": { "target": 0.99, "timeslice_target": 0.95, "timeslice_window": "1m" } }' ```
SLO using custom KQL indicator ``` curl --request POST \ --url http://localhost:5601/qca/api/observability/slos \ --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \ --header 'Content-Type: application/json' \ --header 'kbn-xsrf: oui' \ --data '{ "name": "SLO latency service log", "description": "latency from logs", "indicator": { "type": "slo.kql.custom", "params": { "index": "high-cardinality-data-fake_logs*", "numerator": "latency < 300", "denominator": "", "query_filter": "labels.groupId: group-0" } }, "time_window": { "duration": "7d", "is_rolling": true }, "budgeting_method": "occurrences", "objective": { "target": 0.98 } }' ```
vinaychandrasekhar commented 1 year ago

@kdelemme I think it's a good idea to offer enable/disable on the SLOs. Let's discuss in our next sync or weekly call. Thanks

ruflin commented 1 year ago

@vinaychandrasekhar Can you provide some updates on this front around SLO in packages?

emma-raffenne commented 1 year ago

@ruflin Corresponding issue has been created and added to AO project, see elastic/kibana#148148, and is targeted for 8.7.0 release.

ruflin commented 1 year ago

@emma-raffenne The issue https://github.com/elastic/kibana/issues/148148 allows to enable / disable SLO's. Do you plan to make SLO available in packages at the same time?

emma-raffenne commented 1 year ago

@ruflin Not yet. This is something that needs more discussion once SLOs will be more mature.

ruflin commented 1 year ago

SLO should have been designed from the beginning to be part of packages. From my point of view, SLOs can only become mature if they are part of the packages otherwise we potentially end up in a situation where SLOs are built in a way that need fundamental changes to add it to a package.

simianhacker commented 1 year ago

Our primary use case was to provide SREs an HTTP API so they could integrate with orchestration systems like Teraform and Ansible; I wasn't aware they would prefer using packages to install SLOs. Maybe @vinaychandrasekhar or @grabowskit could weigh in on where this fits in terms of feature priority and value for the SRE.

We've recently added a requirement for an enable/disable feature that would allow us to install the SLO definition (which is just a Kibana saved object) in a disabled state. Once the user decides to enable them (through our APIs or UI), our SLOs back-end will install the indices/transforms and start the process to generate the SLI data. With that feature in mind, integrating with packages should not be much different than any other Kibana resource.

I'm my opinion, providing the SRE with the tools and interfaces that integrates with their orchestration systems is the top priority and will make the SLO project a success; it's literally what they are asking for. Also the nature of SLOs is very specific to the end user's business (and SLAs) that I'm not convinced we could provide valuable prepackaged SLOs.

Is there a use case OR example you have in mind that would illustrate the need for a prepackaged SLO?

ruflin commented 1 year ago

We've recently added a requirement for an enable/disable feature that would allow us to install the SLO definition (which is just a Kibana saved object) in a disabled state. Once the user decides to enable them (through our APIs or UI), our SLOs back-end will install the indices/transforms and start the process to generate the SLI data. With that feature in mind, integrating with packages should not be much different than any other Kibana resource.

This sounds very promising. I agree, this should be sufficient to make it work as part of packages.

Is there a use case OR example you have in mind that would illustrate the need for a prepackaged SLO?

The basic assumption I follow is that in the future, most data that users are shipping is based on integrations. And the SLOs are in one way or another attached to this data.

Our primary use case was to provide SREs an HTTP API so they could integrate with orchestration systems like Teraform and Ansible;

To be clear, I don't think it should be either / or. It is key to have an API that users can integrate with their orchestration system. I go so far to think these two features align really well with either other. If there is an API that orchestration systems can integrate with, it means package installation can also integrate with it as it is just another orchestration system. The main pieces that was missing is enabling / disabling SLO.

To enable / disable SLO, I assume there is also an HTTP API? I rather not touch saved objects directly as I would consider this an implementation detail.

kdelemme commented 1 year ago

@ruflin

To enable / disable SLO, I assume there is also an HTTP API? I rather not touch saved objects directly as I would consider this an implementation detail.

Totally agree with you, we want to hide the implementation details behind an API for enabling/disabling an SLO. I'm going to work on this next, so this would be available in 8.7.

ruflin commented 1 year ago

Enabling / disabling SLO just made it in: https://github.com/elastic/kibana/pull/149546 This provides the foundation to make SLO usable in packages. What we are still missing is support for it in package-spec and Fleet.