elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.58k stars 8.1k forks source link

Reasons for not using saved objects for storing kibana data #80912

Open kobelb opened 3 years ago

kobelb commented 3 years ago
🚩 Note I intend to use the main issue description to reflect our growing understanding of the situation, so I will be periodically updating the main issue description to reflect what we discuss. I'll make sure to add a comment denoting that this has occurred, so it's not silently changing.

A majority of Kibana's entities are persisted in saved-objects. However, there's a growing number of non-saved-object Elasticsearch indices that are being used to store Kibana specific entities. The following are the ones that I'm currently aware of:

  1. Alerting's event log - .kibana-event-log-*
  2. APM agent configuration - .apm-agent-configuration
  3. APM custom link - .apm-custom-link
  4. Detection engine signals - .siem-signals-*
  5. Security solution lists - .lists and .values
  6. Reporting - .reporting-*

I've started this discuss issue to determine what other Elasticsearch indices are being used to store Kibana specific entities, and enumerate the reasons for why they aren't being stored as saved-objects. Saved-objects provide a number of features including migrations, authorization, audit logging, export/import, space awareness, and encrypted attributes that developers forgo when using non-saved-object ES indices.

I'd like to perform this exercise to ensure that there aren't limitations that should be addressed with saved-objects to make them applicable to other use-cases or figure out which current saved-object specific features should be made available when using non-saved-object ES indices.

Reasons we haven't used saved-objects

End-users should be able to query the indices directly

Saved-objects are stored in a "system index", and as such, end-users will not be able to query these indices directly starting in 8.0. Even if end-users could theoretically query system-indices, we treat the ES document format as an implementation detail of saved-objects, and they're prone to change during minor versions in a non-backward compatible manner, so end-users shouldn't be querying them directly.

Applies to: Alerting's event log, Detection engine signals

There are too many saved-objects

The SIEM team has outlined a few of the issues that they experienced when trying to model their lists using saved-objects in https://github.com/elastic/kibana/issues/64715. Notably, SavedObjectsClient#find's paging implementation doesn't function properly when there are more than 10k results, which is being tracked by https://github.com/elastic/kibana/issues/77961.

Applies to: Security solution lists

Documents are too large

Reporting is using its own dedicated .reporting-* indices because they include base64 encoded data for the generated CSVs, PDFs and PNGs. Since these documents are generally so large, they can't be migrated using saved-object migrations, and they're created on a weekly basis.

Applies to: Reporting

Aggregations

Plugins wanting to run aggregations cannot use the saved objects client (we have made good progress in https://github.com/elastic/kibana/pull/64002 but it might take some time for plugins to adopt it).

In addition, it will not be possible to use a query to limit the documents to aggregate over. One workaround is to use a KQL filter, but this impacts performance and is discouraged by the ES team https://github.com/elastic/kibana/issues/69172

Applies to: APM Agent Configuration

Filtering on update / delete queries

It's not possible to efficiently delete or update many documents without doing these operations over all documents of a certain saved object type

Filtering on nested fields

Filter validation fails when writing a KQL query for nested field types https://github.com/elastic/kibana/issues/81009

kobelb commented 3 years ago

@sqren Do any of the existing reasons for not using saved-objects apply to the APM specific entities, or would we model these using saved-objects given hind-sight and an appropriate method of transitioning to saved-objects?

/cc @elastic/kibana-platform @spong @XavierM @mikecote

sorenlouv commented 3 years ago

@kobelb Thanks for starting this discussion. It's been a while since we decided to go with a dedicated system index over saved objects so I might be forgetting some details, and SO might have changed. Overall I think it boiled down to limitations in querying abilities. For agent configuration we need to filter documents using boolean logic and operators like constant_score and boost. At the time custom queries for retrieving saved objects were not supported (or perhaps recommended against?).

This is an example of the query we make to retrieve an agent configuration: https://github.com/elastic/kibana/blob/71f4c085b72034b1fc5e00c1c8914500da321f87/x-pack/plugins/apm/server/lib/settings/agent_configuration/search_configurations.ts#L27-L66

Is this something that's possible today?

kobelb commented 3 years ago

Is this something that's possible today?

SavedObjectsClient#find supports KQL expressions now; however, as far as I'm aware, KQL does not support constant_score queries. @lukasolson, can you confirm this?

kobelb commented 3 years ago

For those following along, I recently added Reporting to the above description. They're using their own system-indices because the report output is stored in base64 encoded fields, which creates large documents.

kobelb commented 3 years ago

@sqren, I heard through the grape-vine that APM recently implemented annotations. Based on the docs, these are stored in the observability-annotations index. Were there specific requirements that led to us not modeling these as saved-objects?

legrego commented 3 years ago

As of 7.10, Kibana stores session information in the ${kibana.index}_security_session* set of indices (docs).

The data is meant to be ephemeral, as Kibana will periodically cleanup sessions that are no longer valid.

These indices are not meant to be consumed by end-users directly, and the more interesting contents are encrypted anyway.

FrankHassanabad commented 3 years ago

A key reason we opted to use a data index instead of a SO for signals/alerting support here:

.siem-signals-*

Was that users want to be able to create dashboards and use discover to query against their alerting data which you cannot do with saved objects at this time. Would be nice to have dashboard/first class query support for saved objects like we have for data indexes.

sorenlouv commented 3 years ago

@sqren, I heard through the grape-vine that APM recently implemented annotations. Based on the docs, these are stored in the observability-annotations index. Were there specific requirements that led to us not modeling these as saved-objects?

We wanted to treat annotations like just another index that users can query in Discover, visualize etc. We also wanted to stay ECS compatible (again, to make querying easier). With that in mind, would SO still have been the recommended approach?

kobelb commented 3 years ago

@FrankHassanabad and @sqren, if we want end-users to be able to query these indices directly, I wouldn't recommend storing them as saved-objects at this time. However, as I've mentioned elsewhere, I'd recommend storing them as . prefixed hidden-indices, which is what we're doing with .siem-signals-*.

lukasolson commented 3 years ago

as far as I'm aware, KQL does not support constant_score queries. @lukasolson, can you confirm this?

That's correct.

rudolf commented 3 years ago

I've added another section "Reasons to not use the saved objects client"

Here's some code references to existing code working around the limitations I've mentioned but felt like it bloats the issue description too much:

https://github.com/elastic/kibana/issues/82716

https://github.com/elastic/kibana/blob/5dfa45d66664bb4724e6e70d28d515833e43820d/x-pack/plugins/task_manager/server/task_store.ts#L606-L620

https://github.com/elastic/kibana/blob/9ca22382fb9f4aca147e07ac9a42bdb1e9d737e4/src/plugins/kibana_usage_collection/server/collectors/kibana/get_saved_object_counts.ts#L70-L72

https://github.com/elastic/kibana/blob/master/x-pack/plugins/security/server/session_management/session_index.test.ts#L487-L496

https://github.com/elastic/kibana/blob/5dfa45d66664bb4724e6e70d28d515833e43820d/x-pack/plugins/task_manager/server/task_store.ts#L283-L292

rudolf commented 3 years ago

Updated the issue now that saved objects supports paging through more than 10k saved objects. I kept the "There are too many saved-objects" section, but changed it to be about the scalability of migrations and export.

elasticmachine commented 3 years ago

Pinging @elastic/kibana-core (Team:Core)