Open ypid-geberit opened 3 years ago
@ypid-geberit I think these tags might be best defined downstream of ECS as what works in an Ingest Pipeline might not match Logstash or even 3rd party Extract-Transform-Load tools
Could you provide more context?
best defined downstream of ECS
I have a strong preference for defining this in ECS. I know multiple log parser "frameworks" (Perl, Logstash, Ingest Pipelines and my current vendor-neutral favorite http://vector.dev/). I don’t see a reason why something should work for one but not the other. All will have something like grok and so on.
Could you provide more context?
Sure, lets look at a practical example. This event is part of integration testing of my Vector config:
{
"message": "<134>1 2021-03-22T17:10:11+01:00 - - - - [meta sysUpTime=\"no int\"] Invalid sysUpTime."
}
It is transformed into this for indexing to ES:
{
"@timestamp": "2021-03-22T16:10:11Z",
"__": {
"event": {
"hash": "fa31847bd9f47fa13459dd3c6922de1924168ed7b175eeea97a39aad01f30c99"
},
"id": "v5_fa31847bd9f47fa13459dd3",
"index_name": "log_other__v1_2021"
},
"ecs": {
"version": "1.9.0"
},
"event": {
"ingested": "Fixed timestamp in test mode.",
"kind": "event",
"original": "<134>1 2021-03-22T17:10:11+01:00 - - - - [meta sysUpTime=\"no int\"] Invalid sysUpTime.",
"severity": 6
},
"host": {},
"log": {
"flags": [
[
"parse_warning: syslog: Drop non-int field meta.sysUpTime: function call error for \"to_int\" at (1938:1968): Invalid integer \"no int\": invalid digit found in string",
"parse_warning: host.name* missing: Neither host.name nor host.name_rdns are known.",
"parse_warning"
]
],
"level": "info",
"syslog": {
"facility": {
"name": "local0"
}
}
},
"message": "Invalid sysUpTime.",
"tags": [
"parse_warning: syslog",
"parse_warning: host.name* missing"
]
}
My idea for what goes into tags
and what into log.flags
is that I would like to include tags
by default in Kibana (Discover saved searches, logs app) for end users. Tags might contain other relevant infos, not just pipeline issues. log.flags
can then be referred to if details about a warning or error are needed. It can also be used to alert by searching on the term "parse_warning" for example.
Hi @ypid-geberit sorry for the slow reply, OOO. These are truly some great ideas - the concept of working with multiple log parser frameworks is inherent to Elastic Observability, see: https://www.elastic.co/guide/en/beats/filebeat/7.13/filebeat-modules.html and there is a built-in processor to add tags see: https://www.elastic.co/guide/en/beats/filebeat/7.13/add-tags.html so the goal would be to respect the diversity of these sources while leveraging their commonality
Summary
Define a field that can hold and communicate detailed failure and warning sentences populated by the log parser (Ingest Pipelines, something else).
tags
field should additionally only contain a summary of the failures and warnings (similar to how Logstash does it).Motivation:
When parsing logs various failures or warnings can occur. Consider the source log is JSON. If decoding the JSON does not work, this would be a failure that the log parser cannot really recover from other than leaving the undecoded JSON in the
message
field.But there are multiple cases were the parser can do something. For example one field cannot be parsed/normalized. For example the user agent. Or if some quality assurance on the
@timestamp
fails,event.created
could be used instead.Some keywords to make this issue better searchable: _dateparsefailure, _grokparsefailure, QA
Detailed Design:
#1372, #1379