elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
771 stars 24.79k forks source link

Upgrading component template logs-settings failed after update to 8.9 #98247

Open magicpotion opened 1 year ago

magicpotion commented 1 year ago

Elasticsearch Version

8.9.0

Installed Plugins

No response

Java Version

bundled

OS Version

cloud

Problem Description

After updating elasticsearch cloud service from version 8.6 to 8.9.0 I want to share the following issue

[instance-0000000005] upgrading component template [logs-settings] for [stack] from version [2] to version [3]
[instance-0000000005] error adding index template [logs-settings] for [stack] java.lang.IllegalArgumentException: updating component template [logs-settings] results in invalid composable template [logs-crawler-default] after templates are merged at [...]

templates weren't tampered with

Any insight how to fix it, I wasn't able to find an updated template version, which perhaps I could update manually.

Steps to Reproduce

Update to version 8.9.0

Logs (if relevant)

No response

albertzaharovits commented 1 year ago

Hi @magicpotion Can you please post this question to https://discuss.elastic.co/c/observability/logs ? Here on Github we prefer we track confirmed bugs for elasticsearch.

seanstory commented 1 year ago

@magicpotion can you link any post you make in our discuss forums, and I can reply there too.

For anyone else who finds this from a google search: When we've seen this error pop up, the real issue is in Enterprise Search, and it had (has) not finished upgrading, or failed to upgrade successfully. I suggest checking your Enterprise Search logs, looking for any ERROR messages that indicate that startup failed. Once Enterprise Search starts successfully, it will migrate the relevant logs templates to all align with one another.

magicpotion commented 1 year ago

Thanks @seanstory the issue was indeed caused by enterprise search

This is indeed a reproducible bug, due to how elastic cloud removes unused resources and ignore_malformed.

Elasticsearch is unable to handle it gently during upgrade and this is not noted in the migration guide.

I understand it's not a place for technical support here, but elastic support was of no help and if this issue would be open, maybe someone would point out to recent changes: https://github.com/elastic/elasticsearch/issues/95224

I fixed it by updating enterprise_search-ecs component template with "ignore_malformed": false under @timestamp mapping.

eedugon commented 1 year ago

@albertzaharovits : in my opinion this is a real bug.

In 8.9 we have started to consider ignore_malformed to be true by default, and that's not fully compatible with the @timestamp field on data streams.

Data streams perform an extra verification where @timestamp cannot have ignore_malformed set to true, raising an issue like:

data stream timestamp field [@timestamp] has disallowed [ignore_malformed] attribute specified

From an old explanation I got in the past:

There's a special validation check in the data streams metadata mapper to ensure that @timestamp doesn't have ignore_malformed set to true. But if you have a global index setting of mapping.ignore_malformed:true then that's going to configure your @timestamp field as such. As a workaround I think you can add an explicit @timestamp field to one of the templates with ignore_malformed:false set on it, but I guess we should probably update the data_streams template that defines @timestamp to also include ignore_malformed:false to fix this properl

So, as now ignore_malformed is true by default, that's causing the issue that in the past it could only be caused by manual ignore_malformed configuration from users.

I'm reopening this, feel free to close it if you believe this is still not an issue on our side.

@seanstory , the issue is not only related with Enterprise Search. Whatever client working with data stream could suffer from this.

Update: The PR https://github.com/elastic/elasticsearch/pull/95329 that implemented this change is taking care of the @timestamp mapping correctly (together with the index setting change). The issue is occurring when upgrading, as something else is interfering with the update of the logs-settings component template (for example an Enterprise Search index template making use of it and without the static mapping of timestamp considered).

The PR implements the new functionality in multiple places: logs-settings for the conflicting index setting and logs-mappings and data-streams-mappings component templates to take care of the timestamp static mapping. If a user (or Enterprise Search) is only using logs-settings component template and not the mappings related component templates, I feel this issue will occur, and the workaround suggested by @magicpotion will be needed.

I will do more research tomorrow, probably we need either a change in logs-settings to be more robust or a public document explaining this possible scenario.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-data-management (Team:Data Management)

zez3 commented 1 year ago

Was able to reproduce the issue if I set ignore_malformed flag true image Strange is that not all of my integrations are affected by this. I just successfully updated the auditd one.

custom log integration

error.stack_trace ResponseError: illegal_argument_exception Caused by: illegal_argument_exception: composable template [logs-httpjson.generic] template after composition with component templates [logs-httpjson.generic@package, logs-httpjson.generic@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid Root causes: illegal_argument_exception: updating component template [logs-httpjson.generic@package] results in invalid composable template [logs-httpjson.generic] after templates are merged at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:479:27) at runMicrotasks (<anonymous>) at processTicksAndRejections (node:internal/process/task_queues:96:5) error.type

but also system

illegal_argument_exception Caused by: illegal_argument_exception: composable template [logs-system.syslog] template after composition with component templates [logs-system.syslog@package, logs-system.syslog@custom, .fleet_globals-1, .fleet_agent_id_verification-1] is invalid Root causes: illegal_argument_exception: updating component template [logs-system.syslog@package] results in invalid composable template [logs-system.syslog] after templates are merged

eedugon commented 1 year ago

@zez3 , thanks for showing us that, but let me try to give you some extra context about this specific Github issue, and sorry if my previous response caused some misunderstanding.

The error itself is not a bug, as its already known that __data streams do NOT support ignore_malformed: true for the @timestamp field__ (and any index or component template creation or update that ends up doing that will be rejected). That's not an issue and it's not even unexpected, but it's true that we might need to improve our docs in this area to give better visibility to this type of error and how to solve it (we are already evaluating it).

The error you have shared can be reproduced also in other Elasticsearch releases in the way you are doing or just by adding the index setting (not mapping) "index.mapping.ignore_malformed": true in a template used for data streams (the operation will be rejected if there's no specific disabling of the setting for @timestamp).

And as mentioned that's not really considered a bug in the code.

This Github issue exists because when upgrading to 8.9, in some environments the upgrade of logs-settings component template (changed in this PR) is failing. And we were not expecting logs-settings update to fail.

I hope this helps! Let's keep the GH issue for the logs-setting specific component template update.

We will be updating this issue soon, and for those facing this (logs-settings component template failing to be updated), the fix is quite easy: just find the offending index or component templates and fix them. I'll try to write a short guide soon about this.

For any other issue of the same nature, the type of solution will be the same, but as mentioned, it's not really an issue to be worried about (more than providing visibility and explanations in the documentation).

zez3 commented 1 year ago

just find the offending index or component templates and fix them. I'll try to write a short guide soon about this.

That would help. Interesting is that in my @custom component templates I don't have "index.mapping.ignore_malformed": true enable. The above was just a test. So, I suspect if it's coming from somewhere else, like the from the managed @package, .fleet_globals-1, .fleet_agent_id_verification-1 components of fleet.

eedugon commented 1 year ago

When you set statically @timestamp with ignore_malformed = true directly in a component (or index) template, if that component template is used by an index template used for data streams and that template is not overriding the @timestamp with ignore_malformed = false at any other level, then the operation will fail.

If you successfully updated something related with auditd and you set ignore_malformed=true on @timestamp, that means that whatever component template you changed was:

That would help. Interesting is that in my @custom component templates I don't have "index.mapping.ignore_malformed": true enable. The above was just a test.

You updated directly the @timsestamp field with the non supported setting, and it caused directly the conflict, there's nothing else to check in that case.

Our change to logs-settings component template has changed the index setting which means that by default ALL fields will have ignore_malformed=true, and that was causing the conflict with @timestamp indirectly in some very specific environments.

But take in mind that setting ignore_malformed: true in @timestamp is not an issue per se (at least today :) ): only if the final processed template (after merging index template with component templates) ends up with this, and it's used for a data stream, then it will fail.

zez3 commented 1 year ago

I see your point. Thanks for the clarification.

in the success update case of logs-auditd.log@package indeed the "ignore_malformed": false, is set there. So as workaround I could set this in my custom templates and do the update. I'll try.

zez3 commented 1 year ago

yup, that seems to help. I was able to update my integrations afterwards.

zez3 commented 1 year ago

I'm hitting this issue again on 8.10.3 updating my integration Custom Logs version Installed version 2.1.0 to Latest version 2.3.0

So this time it seems because I had prior defined in my logs-log.log@custom composable template the @timestamp field with "ignore_malformed": false It might be that sometime later, via package update, the @timestamp is now also defined in logs-log.log@package (so it's not allowed to be defined again?) in logs-log.log@custom even if both are having "ignore_malformed": false

@kpollich @joshdover I feel the need to file another issue for this not allowed @timestamp "ignore_malformed": false in logs-log.log@package but also in logs-log.log@custom

Since this is also defined in the dynamic template then it might be related: https://github.com/elastic/integrations/issues/4236#issuecomment-1763852343

eedugon commented 1 year ago

@zez3 : I would recommend better a discussion in https://discuss.elastic.co/c/observability/logs before raising a new issue in the repo.

It's important to get the exact error with the stack trace information, as that always identify the composable or index templates that are in conflict, together with the specific error message.

I don't believe (but I could be wrong) that defining multiple times @timestamp would be an issue if they are always defined with "ignore_malformed": false, that's why I'd like to see the full error. And probably better in a discuss forum as the issue might not really be the same.

zez3 commented 1 year ago

@eedugon I've open a new support case for this. If support want to do this via a forum they can do it.

eedugon commented 1 year ago

@zez3 : if you are a customer then definitely the support case is the best approach of course.

eedugon commented 1 year ago

For those interested on this issue the behavior of 8.10 is expected to be similar to 8.9 and the mappings conflict described here could also appear in 8.10.

We are aiming for a fix via https://github.com/elastic/elasticsearch/pull/99346 on 8.11.

zez3 commented 9 months ago

I'm hitting this again on 8.12 with

updating component template [metrics-mysql.performance@custom] results in invalid composable template [metrics-mysql.performance] after templates are merged

I've removed the @timestamp from my custom mapping , so this time it's not about @timestamp ?

@zez3 : I would recommend better a discussion in https://discuss.elastic.co/c/observability/logs before raising a new issue in the repo.

https://discuss.elastic.co/t/unable-to-create-component-template-updating-component-template-results-in-invalid-composable-template-after-templates-are-merged/352271

I've also files a Case with Support