elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.61k stars 24.63k forks source link

File Structure Finder should handle whitespace better #51167

Open benwtrent opened 4 years ago

benwtrent commented 4 years ago

Log lines like the following:

[2020-01-17T09:25:35,792][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [Benjamins-MacBook-Pro.local] Removing ban for the parent [edx7HRvUTr6_4AuIUlziIQ:5242648] on the node [edx7HRvUTr6_4AuIUlziIQ]
[2020-01-17T10:17:47,664][INFO ][o.e.n.Node               ] [Benjamins-MacBook-Pro.local] stopping ...

Are automatically Grok'd as the pattern

\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{LOGLEVEL:loglevel}.*

I think that a better pattern might have been discovered if the grok pattern discovery accounted for trailing whitespaces before and after grok patterns.

Something like

\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{LOGLEVEL:loglevel}\s*\]\[.*

It would be even better if DATA grok patterns could be used, but those are pretty general and might could only be used if there are some closing brackets (like in the standard ES logs)

I personally would love the pattern to ultimately result in something like

\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{LOGLEVEL:loglevel}\s*\]\[%{JAVACLASS:class1}\s*\][%{DATA:data1}\].*"
elasticmachine commented 4 years ago

Pinging @elastic/ml-core (:ml)