Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.31k stars 1.05k forks source link

Indexer failures should produce more information for root cause analysis #19615

Open mikkolehtisalo opened 2 months ago

mikkolehtisalo commented 2 months ago

What?

Indexer failure messages in the UI look something like this:

2 hours ago techlog_52 c5f5e982-287f-11ef-954a-00505687ab33 OpenSearchException[OpenSearch exception [type=mapper_parsing_exception, reason=failed to parse field [level] of type [long] in document with id 'c5f5e982-287f-11ef-954a-00505687ab33'. Preview of field's value: 'Information']]; nested: OpenSearchException[OpenSearch exception [type=illegal_argument_exception, reason=For input string: "Information"]];

This is not really helpful for resolving the issue. If you have large amount of servers, systems, and components, the issue could be in numerous components generating logs, different responsible teams and so on. It is impossible to start diagnostics when you don't even know whom to start it with.

It seems OpenSearch doesn't log the issue from the example message I provided at all. It would apparently require debug logging level to appear, and that is simply not doable when you receive huge volume of logs. Graylog should be the component that produces extra information.

Alternatives:

See MessagesAdapterOS2 for clues. Offending message at least should be available in most cases.

Why?

The current indexer failures view doesn't provide basic required information for resolving the issues. It is not possible to resolve indexer failures in more complex environments.

Your Environment

n/a

tellistone commented 2 months ago

Hi mikkolehtisalo

I think the info you seek is already available via the "Processing and Indexing Failures" Index

If I navigate to System > Overview, to the Indexing error section and hit "show errors"

image

And look at the failed messages - I can see the cause, the source, the associated stream (and thus index) etc. The only info missing is the associated input:

image

This is enabled via System > Configuration, here:

image

Does this provide the info you need?

mikkolehtisalo commented 2 months ago

Failure processing plugin doesn't seem to exist on my system.

image

tellistone commented 2 months ago

May I ask which version number of Graylog? Open or Enterprise?