Closed timroes closed 1 year ago
Pinging @elastic/kibana-platform
If we switch from keyword
to object
(which I hope) it would be great if we could introduce this as optional in the 7.x releases. The saved objects could contain a field decoded: true
to indicate in the 7.x cycle if KB has to encode it or not. It would be off by default. In 8.0 the default would then be true. This would help with the migration, allowing the assets of the last minors of the 7.x releases assuming they have decoded: true
inside, also to be compatible with 8.x.
@elastic/kibana-platform Any update on this? We are using decoded JSON in all our packages (https://github.com/elastic/package-registry/pull/354) at the moment for versioning purposes but then encode them again during the packaging. It would be nice to align here if possible.
@ruflin
This would help with the migration, allowing the assets of the last minors of the 7.x releases assuming they have decoded: true inside, also to be compatible with 8.x.
Kibana migrations already take care of backwards compatibility. So in 8.0 the maps team could write a migration to store layerlistJSON
as a decoded object. If a user imports a 7.last map or a 7.last map is sent to the saved objects HTTP API, these objects will automatically be migrated (decoded) before being stored.
This doesn't solve for the unnecessary encoding/decoding you currently have in package-registry (at least not until 8.0), but I assume the pain isn't bad enough to justify adding a JSON encoding/decoding option to saved objects just to remove this serialization?
My question is more about long term alignment. Is the plan in 8.0 to have all SO content decoded or will it stay as is today?
Yes, all JSON strings should be stored decoded as objects in 8.x. This will ultimately be up to different teams to implement and I'm not sure if we could enforce this for 8.0 but the effort should be minimal so I don't see a reason teams wouldn't be able to comply.
This is great! I'm wondering if the "owners" of each saved object know about this? If not, perhaps worth sending out a note or ping them here?
Yes I agree, it's definitely worth coordinating this with all the teams. However, since 8.x is still a long way off I think teams would benefit by not having 7.x and master branches diverge until we're closer to 8.x. I will own giving teams an early heads up during 8.0-alpha1
@timroes, @rudolf The telemetry saved object has all of 8 fields and we use all of them to determine a few things:
@TinaHeiligers if you've audited your plugin and no fields can be removed you can just tick off the task in the issue, thanks!
There are many plugins running taskManager tasks to calculate the telemetry object and storing them as savedObjects. So when the fetcher kicks in, they'll return the savedObject content. Maybe it's a good idea to use the type object
with enabled: false
approach in these scenarios.
We only one instance of setting saved objects (https://github.com/elastic/kibana/blob/8ffc08f2f7bf1a017d4d875e9de869ecd2339d0d/x-pack/plugins/monitoring/server/routes/api/v1/alerts/alerts.js#L77) which is going away with https://github.com/elastic/kibana/pull/68805
I'm not seeing flattened
in the list of recommendations. In case it would help but you are not considering it only because it's only available with a Basic license, this might be something we could open up for discussion.
It's worth noting that Task Manager does not use the .kibana
index for its saved objects, so the Task Manager plugin can be omitted from this Issue.
APM fixed in #70524.
Does this apply to Binary fields as well? They seem to come back from the api endpoint you shared, but are marked as not searchable.
For example:
"alert.apiKey": {
"binary": {
"type": "binary",
"searchable": false,
"aggregatable": false
}
},
I tried adding index:false
and doc_values:false
but they still get added to the list.
Any advice on these?
It doesn't look like Alerting/Actions have any other fields we can remove - we use them all for searching, sorting etc.
@gmmorris binary fields aren't searchable and defaults to doc_values: false
. More generally, setting index:false
and doc_values:false
will remove a lot of the overhead of indexing a field, but won't remove the field from the field count. I've updated the issue to make this more clear.
@jpountz
I'm not seeing flattened in the list of recommendations. In case it would help but you are not considering it only because it's only available with a Basic license, this might be something we could open up for discussion.
I've added flattened
to the recommendations. I think the biggest mappings are typically in X-Pack, but if we can identify big wins by using it in OSS I will definitely open up that discussion.
@gmmorris binary fields aren't searchable and defaults to
doc_values: false
. More generally, settingindex:false
anddoc_values:false
will remove a lot of the overhead of indexing a field, but won't remove the field from the field count. I've updated the issue to make this more clear.
Ah, I see, so to remove them from the field count we need to remove them from the mappings entirely?
After assessing we've come to the conclusion that throughout out Alerting and Actions there are actually only two fields we can remove, as they're binary and not searchable anyway. With that in mind - we might leave it as is, as omitting these fields from the mappings could cause confusion in maintenance further down the line.
I hope that's acceptable on you 😬 Let me know if you have other thoughts.
After assessing we've come to the conclusion that throughout out Alerting and Actions there are actually only two fields we can remove, as they're binary and not searchable anyway. With that in mind - we might leave it as is, as omitting these fields from the mappings could cause confusion in maintenance further down the line.
Makes sense :+1: Especially for 7.9 we need to focus on the low hanging fruit.
I made the flattened
section a bit more detailed to explain, when the dynamic: false
approach is more preferable, since I think there are not that much cases where flattened
would actually be the preferred way.
striked out a few plugins that are deprecated.
Closing this as the enhancements we've made to scale saved object migrations https://github.com/elastic/kibana/issues/144035 and serverless zdt migrations prevent us from removing existing fields. To mitigate the field growth we have split the .kibana saved objects into several smaller indices.
Update 29 June 2020
With 7.9 currently having ~960 fields we're fast approaching the 1000 field default limit. Please audit your plugins mappings and remove any unnecessary fields. Link from your PR back to this issue and mark your plugin's task as complete once the PR has been merged.
Removing fields
Setting
index:false
anddoc_values:false
removes some of the overhead of a field, but doesn't reduce the field count. To reduce the field count fields need to be removed from the mappings completely. This can be done by specifyingdynamic: false
on any level of your mappings.For example, the following diff will remove three fields from the field count. The removed fields can still be stored in the Saved Object type but searching and aggregation is only possible on the
timestamp
field. Note: this change also removes any validation on Elasticsearch, which will allow saved objects with unknown attributes to be saved. Because of this we recommend by starting only with low-risk saved object types like telemetry data.You can use the following command to count the amount of fields to do a before/after comparison (requires
brew install jq
):Plugins:
[ ] plugins/timelion @elastic/kibana-app @flash1293xpack/plugins/canvas @elastic/kibana-canvasxpack/plugins/file_upload @elastic/kibana-gisxpack/plugins/graph @elastic/kibana-app @flash1293xpack/plugins/task_manager @elastic/kibana-alerting-servicesTask Manager does not use the .kibana indexOriginal issue
Looking at the current mapping for a lot of our saved objects we're indexing a terrible amount of unnecessary fields, i.e. fields we know we'll never want to search through or filter over. Indexing those will just waste some more heap in Elasticsearch, if the field is unnecessary analyzed waste a couple of milliseconds on every insert and thus every migration. We even use a lot of
text
fields in places where we store stringified JSON which doesn't make any sense, since the analyzer won't end up with anything meaningful here.This is not a huge problem, since the.kibana
index is rather small usually, and also a lot of those JSON fields might be over the defaultignore_above
value of 256 and thus not indexed in most documents. Despite not being a huge problem I discussed this with @joshdover @tylersmalley and @rudolf and we agreed, that we should not waste Heap and indexing performance on fields we know we'll never need indexed.As the field count on
.kibana
is approaching the default limit of 1000 fields we need to urgently evaluate whether or not all fields are really necessary for performing queries or filters.Mapping recommendations
Here are a couple of general recommendations for how the mappings of a saved object should look:
type=text only for full text search on real text
A field with type
text
in the mapping will be analyzed and indexed. This makes sense only for fields we know we want to do full text search on, e.g. thetitle
ordescription
of a field. If you don't need the field value analyzed for full text search, don't index the field (see below) or usekeyword
with an appropriateignore_above
as a type instead. Good examples for a properkeyword
field would be thevisType
orlanguage
of a query.Don't index if not needed
Especially with
keyword
fields, we very often index a field without thinking about it (because it's the default option). If we know we'll never need to aggregate over that field or query for that field, but just have it available when retrieving the saved object, setindex: false
anddoc_values: false
(unless it's atext
orannotated_text
field) in the mapping for that field.A couple of examples where it might make sense to have a (
keyword
) field indexed:visType
: we might want to filter on that later and thus need to be able to query by that fieldlanguage
(of a query): even though we might never want to expose that in the UI, we might want to aggregate that field for telemetry dataA couple of examples where indexing doesn't make much sense:
expression
(the "canvas" expression of a visualization): It doesn't make any sense filtering on the complex expression as a whole, neither aggregate over it. If we would want to build telemetry, we would anyway need to look at each document individually and e.g. parse it and count the containing functions.JSON fields
We have a couple of places where we use a
keyword
field (often even indexed) to store some JSON object, like the configuration of a visualization, or the state of a dashboard. As a first step, these fields should be set toindex: false
.As a further optimization this data can be saved as a field of type
object
withenabled: false
. That way the content of that field will simply be ignored by Elasticsearch, it won't be indexed or analyzed, but still returned as it was indexed (as JSON) in the saved object. This removes an unnecessaryJSON.stringify
andJSON.parse
when saving/loading those objects. Note: this will require writing a migration function for your saved object and changing any consuming code, so this is not an immediate need, but rather something to work towards for 8.0.Consider using
type: 'flattened'
(licence basic) if you need to search over many fields or an unknown amount of fieldsFlattened types uses a single field for the entire object. It comes with some limitations but in many instances can significantly reduce the field count while still being able to search/aggregate over the fields inside the object.
Keep in mind, that using the
flattened
field type, will still index all data within this field. If you just need one specific sub-field aggregated/searchable, but the rest not, the above describeddynamic: false
approach (where the parent key isdynamic: false
and just that one sub-field you need search/aggregation on would have an (indexed) typing) would be more preferable. Usage offlattened
is mostly preferred, if you potentially need to search/aggregate through a larger amount of sub-fields.What happens after I changed my plugins mappings?
If you switch a field from an indexed to a not-indexed state (e.g. with
enabled: false
orindex: false
), the migration system will automatically update the mappings when Kibana is upgraded, no further action is required. If your plugin has recently removed or renamed an entire Saved Object type, these old mappings might not have been cleaned up. Please reach out to @elastic/kibana-platform if you think this might be the case.