[Stack Monitoring] Metricbeat writes to .monitoring-*-8-mb indice instead of datastream after 8.0 upgrade

klacabane commented 2 years ago

Summary

In 8.x, metricbeat writes stack monitoring data to .monitoring-{product}-8-mb index patterns. These patterns are backed by datastream templates created by elasticsearch at startup and define the corresponding ILM and mappings necessary for SM to function.

We may have a race condition where an unmapped/unusable indice could be created instead of the expected datastream if elasticsearch didn't setup the backing template when metricbeat start writing data to .monitoring-{product}-8-mb.

If scenario is confirmed, we should work on a fix that prevents data from being written to the .monitoring-{product}-8-mb patterns until the templates are successfully setup.

This ticket tracks the investigation and resolution of that potential scenario.

jbaiera commented 2 years ago

Potential scenario steps: 1) Upgrade Beats and Elasticsearch to 8.0 2) Beats sends metric data to Elasticsearch targetting .monitoring-es-8-mb 3) Elasticsearch has not yet automatically installed the data stream template (Installed automatically, but after nodes become available for the first time). 4) .monitoring-es-8-mb is created as a regular index - field aliases from the template are missing 5) Querying the data in Stack Monitoring UI breaks because of incorrectly mapped data 6) 3 days later, the local ES Monitoring Plugin CleanerService marks and deletes the .monitoring-es-8-mb index since it is an index that matches .monitoring-* (source) 7) Beats continues to send metric data, which then correctly creates a data stream using the present template

lucabelluccini commented 2 years ago

Since Elasticsearch 7.10 bulk requests have require_alias to avoid creating an index (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).

I've not tested if require_alias also works against data streams. If it is the case, Metricbeat could add require_alias to ensure it will not create indices prior the correct setup.

klacabane commented 2 years ago

Confirming that I was able to replicate the issue with a large cloud deployment, .monitoring-es-8-mb indice was created while the template and ILM policies are nowhere to be found. So during the rolling upgrade, we upgrade a node at a time which contain metricbeat 8.x while the rest of the cluster is still running on 7.x. When this first node upgrade is done (or at least when metricbeat is ready), metricbeat 8.x may route write requests to a 7.x node in the cluster while the templates are not installed. @mat do you know where I could find resources on cloud upgrades internals ?

Looking at the logs for this cluster, the .monitoring-es-mb index template is created at 16:53:29.677 UTC&logFilter=(language:kuery,query:monitoring-es-mb)) and the first 8.x document is indexed at 16:46:31.922 UTC&_a=(columns:!(),filters:!(),index:'76997040-a939-11ec-b01a-675d8d6fadea',interval:auto,query:(language:kuery,query:''),sort:!(!('@timestamp',desc)))). Dropping events until the templates are available is questionable with a 7mn gap unless the node being monitored is not accepting traffic until that setup is complete, do you happen to know if that's the case @jbaiera ?

Since Elasticsearch 7.10 bulk requests have require_alias to avoid creating an index

Unfortunately this does not appear to work with datastreams as I'm getting this error both when the template exists (v8.x) or not (v7.x)

...
(status=404): {\"type\":\"index_not_found_exception\",\"reason\":\"no such index [.monitoring-es-8-mb] and [require_alias] request flag is [true] and [.monitoring-es-8-mb] is not an alias\",\"index_uuid\":\"_na_\",\"index\":\".monitoring-es-8-mb\"}, dropping event!"
...

jbaiera commented 2 years ago

Looking at the logs for this cluster, the .monitoring-es-mb index template is created at 16:53:29.677 UTC&logFilter=(language:kuery,query:monitoring-es-mb)) and the first 8.x document is indexed at 16:46:31.922 UTC&_a=(columns:!(),filters:!(),index:'76997040-a939-11ec-b01a-675d8d6fadea',interval:auto,query:(language:kuery,query:''),sort:!(!('@timestamp',desc)))). Dropping events until the templates are available is questionable with a 7mn gap unless the node being monitored is not accepting traffic until that setup is complete, do you happen to know if that's the case @jbaiera ?

The template registry is configured to only run on the master node of the cluster. I think during a rolling upgrade all nodes are updated to the newer version with the master node being updated last. This could create a window of time where there are 8.0 nodes in the cluster but no monitoring templates are available.

jbaiera commented 2 years ago

I just validated that this is the case, at least on cloud. I'll open a discussion issue in the ES repo to see if there are any steps we can take on the ES side of things so that we don't have to drop events during a rolling upgrade.

jbaiera commented 2 years ago

I opened https://github.com/elastic/elasticsearch/issues/85247 to see if we can help address the availability window for the data during a rolling upgrade.

klacabane commented 2 years ago

On the beats side we could prevent the SM modules from writing monitoring events until the templates are setup. Downside is that we'll fly blind during the upgrade for the nodes that completed the process, but we could cache those events (ie the last n bytes) as a best effort and eventually backfill them. I'll look if there is a existing beats mechanism that we can reuse for that - maybe @kvch @ruflin have suggestions ?

ruflin commented 2 years ago

I remember we had many discussions around this before data streams existed because of the alias, less the template problem. In general I agree, if template / data streams are not there, we should not write to the data streams as it might cause issues. The part I'm not sure about is, if we should really solve this problem on the ingest side. Beats / Elastic Agent needs a lot of unnecessary permissions to do the checks needed. One the one hand it is a permissions issue but I would argue also partially a performance issue as likely each time the beat reconnects, it needs to do these checks. Instead I would like to see this rather enforced on the Elasticsearch side. If template / data stream is not there, Elasticsearch does not allow to ingest data. I somehow remember even we wanted to build such a param, not sure if that ever landed in Elasticsearch. @kvch you might remember?

klacabane commented 2 years ago

Agreed on a flag similar to require_alias that rejects indexing requests targeting a non-existing template. However I'm concerned about the loss of data during the interval where the template is not yet installed (15mn on a 5 node cluster), this is large enough to miss important patterns and alert states. Backfilling would not solve this problem, but could allow detection of the patterns (without actioning of them since the alerts could not run in time). If an existing mechanism exists this could be a cheap improvement. Improving the time it takes to land these templates would make that effort unnecessary so let's see if there are improvements on that front first :) I'm wondering if enabling self-monitoring before the upgrade starts could solve that issue so that we're still ingesting data during the process ?

ruflin commented 2 years ago

15mn on a 5 node cluster

I was hoping we talk about a much shorter time period. One of the key problems here I think is that metricbeat runs inside the docker container in Cloud. Ideally, Beats would only be updated after the full cluster is updated so this issue would not happen in the first place. But this is not something we can fix for 8.0 :-(

Alternative workaround idea: What if the 7.last migration assistant would put the right template already in place?

jasonrhodes commented 2 years ago

@ruflin do Beat modules expect you to run mybeat setup on every version upgrade of the beat? I'm curious how other beat modules handle this (when MB is the one responsible for installing mappings for most modules)

ruflin commented 2 years ago

Beat by default checks if the templates and ingest pipelines are their on each reconnect. You don't need to run setup for this, Beats just does it. As it is all 1 template, it is just 1 check for all the modules.

jasonrhodes commented 2 years ago

3. Elasticsearch has not yet automatically installed the data stream template (Installed automatically, but after nodes become available for the first time).

I'm looking through this again and wondering about this line. @jbaiera it sounds like you're saying that ES waits to create the data stream templates until the nodes are available. @klacabane is saying that we've noticed as much as a 7 minute gap between MB writing data and the data stream template being created.

What I'm curious about is, if the template is created as soon as the nodes are available, what is MB writing to for 7+ minutes?

klacabane commented 2 years ago

The templates will be installed by the master node once it is upgraded to 8.x. In a rolling upgrade the master node will be upgraded last (or at least after some data node have been upgraded, need to verify the cloud upgrade orchestration details). When a non-master node upgrades its sidecar metricbeat will write to the monitoring index pattern but since the templates are not there yet it creates a mapping-less indice instead of the expected datastream which will break SM querying. The gap is the time it took for the templates to be installed in a small-sized cluster.

Ideally we want to maintain monitoring visibility during the upgrade so the templates need to be installed as soon as the first node is upgraded:

can we install the templates via the upgrade assistant ?
can we install the templates with a non-master node ?
can metricbeat install the templates ?

Otherwise we could prevent metricbeat from writing to the index pattern if no underlying template have been installed, something like the require_alias query parameter.

jasonrhodes commented 2 years ago

I like the options of installing templates for each node as it's upgraded, but I don't know what the implications of that might be. I haven't thought too hard about index templates and their relationship to master v non-master nodes ... each node doesn't have its own copy of the templates?

jbaiera commented 2 years ago

... each node doesn't have its own copy of the templates?

Each node has information about the templates because they are stored in the cluster state, but the master node is where all template creation operations are applied to the cluster state.

Most of the automatic template creation logic waits for the master node to be upgraded before installing templates. This is because we added composable index templates as a feature in the 7.x line (7.8 to be exact). Master nodes are usually the last to be upgraded in a rolling upgrade (for safety purposes). If you are upgrading from an early version on the 7.x line that doesn't have support for composable templates to a version that does, then you need to wait for the master to be upgraded so that it knows how to apply those templates to the cluster state.

Now that we're in 8.0 territory the oldest version a node can be during a rolling upgrade is likely to be 7.17.X which is safely always aware of composable index templates. Waiting for the master node to be upgraded for the above reason isn't required at this point, but if we ever add new functionality to the templates that doesn't exist in earlier 8.x versions, then we must again wait for the master to be fully updated before templates can be installed. In that situation, metric collection will likely have to wait again.

jbaiera commented 2 years ago

So ultimately what I'm saying is that we could get the templates to install earlier, but I'm worried that we will just kick the can down the road until we have new functionality we want to add to the templates (perhaps TSDB stuff?) and this becomes a problem again. I am going to see if we can discuss the crux of the issue (https://github.com/elastic/elasticsearch/issues/85247) in the next data management area meeting and figure out where we go on to next.

jasonrhodes commented 2 years ago

Thanks, @jbaiera -- keep us updated. We could tell users that they lose monitoring visibility during a rolling upgrade but that feels like it calls into suspicion the point of a rolling upgrade, a bit.

jbaiera commented 2 years ago

We met and discussed some things about template installation and we determined that even though we could add template installation to be done before the master is available currently, any feature additions to templates in the future would just cause us to be in the same state. We discussed some options around how to avoid gaps in monitoring data when doing rolling upgrades:

1) Changes to the orchestration logic - Our upgrade documentation says to fully update Elasticsearch before upgrading collectors (like metricbeat, agent). This is also the workflow we have in mind on the ES team when making changes. It would be more correct to have the collecting agents upgraded all at once after the full rolling restart has finished and rely on the older version of metric collection in the gap time since the resources on the cluster should be stable. Probably easier to do this with k8s but worth exploring as an option. 2) We could make changes to beats so that it supports writing to the older indices/resources until the templates are available for ingestion. This would require some slight overhead in the agents to check the templates and switch to an older resource, potentially changing the shape of the data if that is a concern. Alternatively, mappings could be written to be backwards compatible, though that makes breaking changes much harder to implement (the root of the problem we're seeing right now).

jasonrhodes commented 2 years ago

Thanks, @jbaiera! I wonder if we should consider just making MB write/update these mappings since this is a MB-only problem. I think that would also solve the issue, right? cc: @klacabane / @matschaffer

matschaffer commented 2 years ago

@jasonrhodes I've seen some challenges with that approach as well. Namely:

Access - in some cases (particularly ECE), we don't want the typical "collection mode" authorization to be able to take cluster-level actions like template updates. It should just be able to set it's own data and that's it.
Template update pressure - if we naively tell beats to push a template on every startup, you can hit start up loop situations (kernel OOM + docker restart behavior) where beats re-puts templates very frequently. These requests involve a cluster state update, so can be costly.

Both problems can be overcome by an extra operator action (or orchestration action) to push templates before upgrading metricbeat.

In general I agree with the sentiment that the publisher of the data should probably own the template rather than embedding templates with elasticsearch.

klacabane commented 2 years ago

Metricbeat installing the templates would solve the problem for now but I think we'll hit the same wall if/when we want to use new template functionalities (ie waiting for master to be upgraded). Moving the templates installation to a different registry (mentioned in https://github.com/elastic/elasticsearch/issues/85247#issuecomment-1102943362) would achieve similar behavior if sooner means before metricbeat gets a chance to write to the indice in a cloud deployment, and I'd be in favor of the latter since it should be much lower effort for the same result (not discounting that having the templates closer to the publisher is a good design, but it does not solve any existing problems).

We could make changes to beats so that it supports writing to the older indices/resources until the templates are available for ingestion.

This would mean bringing back the code that handled the legacy document format (completely removed in 8.x) and make the metricbeat code a complex beast. I'm not convinced such effort is justified for now but could be when we eventually face that problem.

Changes to the cloud orchestration to reflect the upgrade order that we recommend for on-prem deployments would be ideal

kunisen commented 2 years ago

Hi team, is there a way to fix this without deleting the offending regular indices (as it's destructive)? i.e. reindex the old data to another index name, delete the offending indices, and then after data stream got created, reindex back the data.

matschaffer commented 2 years ago

It'd be good to test out before adding it to docs, but I don't see why that wouldn't be an option. You could probably even skip the final reindex in favor of reindexing to a name covered by typical index patterns (for example, .monitoring-{product}-8-mb-reindex1).

jasonrhodes commented 2 years ago

Metricbeat installing the templates would solve the problem for now but I think we'll hit the same wall if/when we want to use new template functionalities (ie waiting for master to be upgraded). Moving the templates installation to a different registry (mentioned in elastic/elasticsearch#85247 (comment)) would achieve similar behavior if sooner means before metricbeat gets a chance to write to the indice in a cloud deployment, and I'd be in favor of the latter since it should be much lower effort for the same result (not discounting that having the templates closer to the publisher is a good design, but it does not solve any existing problems).

Hm ok, I don't 100% follow the upgrade flow but if this is true, then I agree that the separate registry is the easier solution for now (with the same ongoing risks). What I also still don't quite understand is how this architecture could ever support a rolling upgrade for anything that involves changes to index templates? Is there a plan for Elasticsearch to fix this at some point?

We could make changes to beats so that it supports writing to the older indices/resources until the templates are available for ingestion.

This would mean bringing back the code that handled the legacy document format (completely removed in 8.x) and make the metricbeat code a complex beast. I'm not convinced such effort is justified for now but could be when we eventually face that problem.

Very much agree here. I think this would be a pretty dangerous road to go down.

botelastic[bot] commented 8 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

elastic / beats

[Stack Monitoring] Metricbeat writes to .monitoring-*-8-mb indice instead of datastream after 8.0 upgrade #30769

Summary