childe / gohangout

使用 golang 模仿的 Logstash。用于消费 Kafka 数据,处理后写入 ES、Clickhouse 等。
MIT License
1.03k stars 238 forks source link

gohangout的grok如何匹配最后一个文本内容 #142

Closed LittleCadet closed 3 years ago

LittleCadet commented 3 years ago

1.背景:需要解析堆栈信息中最后一个堆栈的信息: eg:

get exception key:selectByDay:2021-06-21 00:00:00 - [contextId=k-g-p-86c686d5fb-zn6pb^1621906296557^65576525]
**.serializer.SerializerRuntimeException: null
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) [spring-core-4.2.0.RELEASE.jar:4.2.0.RELEASE]
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) [spring-aop-4.2.0.RELEASE.jar:4.2.0.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) [spring-aop-4.2.0.RELEASE.jar:4.2.0.RELEASE]
    ...
Caused by: java.io.IOException: java.lang.RuntimeException: class not found CLASSNAME:*** loader:WebappClassLoader
  context: 
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
java.net.URLClassLoader@426e505c

    at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:243) ~[fst-2.43.jar:na]
    ... 92 common frames omitted
Caused by: java.lang.RuntimeException: class not found CLASSNAME:*** loader:WebappClassLoader
  context: 
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
java.net.URLClassLoader@426e505c

    at org.nustaq.serialization.FSTClazzNameRegistry.classForName(FSTClazzNameRegistry.java:234) ~[fst-2.43.jar:na]
    at org.nustaq.serialization.FSTClazzNameRegistry.classForName(FSTClazzNameRegistry.java:189) ~[fst-2.43.jar:na]
    ... 95 common frames omitted
Caused by: java.lang.ClassNotFoundException: ***
    at java.lang.Class.forName0(Native Method) ~[na:1.8.0_211]
    at java.lang.Class.forName(Class.java:348) ~[na:1.8.0_211]
    at org.nustaq.serialization.FSTClazzNameRegistry.classForName(FSTClazzNameRegistry.java:196) ~[fst-2.43.jar:na]
    ... 107 common frames omitted

2.分析: 这个如果使用logstash的话,实际上是非常友好的,因为,有break_on_match。但是在gohangout中,很不幸,不支持该属性,但既然选择了gohangout,那么路肯定要走下去。

3.尝试的方式:

     -Grok: \
        src: message
        match:
          # 堆栈信息中的最一个异常包含caused by的,取该数据
          - 'Caused\sby:\s%{GREEDYDATA:errorSource}'
        pattern_paths:
          - '/usr/local/gohangout1.4.9/pattern'
   这种方式: 是错误的,原因就是:grok不管在gohangout,还是logstash中,都是首个匹配到了,就会自动break,所以输出的是首个Caused by的信息
图片
LittleCadet commented 3 years ago

与作者 @childe 沟通后: 发现有一种独特的表达方式,可以满足该需要,关键只是用一次grok:

(?ms).*Caused by: %{NOTSPACE:errorSummary}

备注: 前面的:(?ms): 是golang独有的表达方式,虽然在grok的工具页中,是无法解析的,但是实际上在gohangout中,可以解析,并且,可以完美的达到解析最后一个匹配内容的要求

这里来两张来自kibana的图: ps: 因为很难复现之前的报错,所以,用的是其他类似的堆栈去模拟的, 解析后的字段是errorSummaryV2

图片

下图是:gohangout需要解析的message:

图片
LittleCadet commented 3 years ago

虽然上面的确可以完美的达到效果,但是,我对pattern稍微变动了一下,就不行了。。。就不能解析了。。。有点差异。这里贴一下: 这是:grok的表达式:

- Grok:
         message
         match:
            # 堆栈信息中的最后一个异常包含caused by的,取该数据
            - '(?ms).*Caused by: %{ERROR_SUMMARY_EXCEPTION:errorSummaryV1}'
            - '(?ms).*Caused by: %{NOTSPACE:errorSummaryV2}'
            # 获取errorCode=***, errorMessage=***的字符串
            - '%{ERROR_SUMMARY_CODE:errorSummary}'
            # 堆栈信息中不包含caused by的,取日志内容的第一行的数据
            - '%{GREEDYDATA:errorSummary}'
        pattern_paths:
            - '/root/gohangout/pattern'
            - '/root/gohangout/pattern_extra'

这是pattern:

ERROR_SUMMARY_EXCEPTION [A-Za-z0-9.]+

按照常理来说: 如果第一个match能匹配到话,那么在kibana中能看到的字段应该只有errorSummaryV1,而没有errorSummaryV2。 但是实际上,在1L中可以看到, 只有errorSummaryV2。说明errorSummaryV1匹配失败了。

但是: 实际上在grok的工具页中:是完全可以的。。。

图片

十分费解。。。不清楚,为啥这个在gohangout中,同一个pattern会解析失败: 可否指点迷津: @childe 希望达到的效果是:将下图中的冒号 去除 图片

childe commented 3 years ago

怀疑你的 ERROR_SUMMARY_EXCEPTION 没有写到自定义的 patterns 里面去,你看一下 Gohangout 启动时的日志。 gohangout -v 5 日志级别使用 5

LittleCadet commented 3 years ago

下面是gohangout的启动日志:可以看到: ERROR_SUMMARY_EXCEPTION实际上已经被加载进去了,而且pattern读取正确。 是不是我哪里整错了。。。。 I0622 16:07:15.011635 3122 grok.go:65] patterns:map[BASE10NUM:([+-]?(?:[0-9]+(?:\.[0-9]+)?)|\.[0-9]+) BASE16NUM:(0[xX]?[0-9a-fA-F]+) CISCOMAC:(?:(?:[A-Fa-f0-9]{4}\. ){2}[A-Fa-f0-9]{4}) COMBINEDAPACHELOG:%{COMMONAPACHELOG} %{QS:referrer} %{QS:agent} COMMONAPACHELOG:%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timest amp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) COMMONMAC:(?:(?:[A-Fa-f0-9]{2 }:){5}[A-Fa-f0-9]{2}) DATA:.*? DATE:%{DATE_US}|%{DATE_EU} DATESTAMP:%{DATE}[- ]%{TIME} DATESTAMP_OTHER:%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR} DATESTAMP_RFC8 22:%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ} DATE_EU:%{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR} DATE_US:%{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR} DAY:(?:Mon(?:day)?|T ue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) ERROR_CN_POSITION:\s*cn.estudy.* ERROR_LA_POSITION:\s*la.kaike.* ERROR_SUMMARY_CODE:err orCode=.* ERROR_SUMMARY_EXCEPTION: [A-Za-z0-9.]+ GREEDYDATA:.* GREEDYMULTILINE:(.|\r|\n)* HOST:%{HOSTNAME} HOSTNAME:\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za -z][0-9A-Za-z-]{0,62}))*(\.?|\b) HOSTPORT:%{IPORHOST}:%{POSINT} HOUR:(?:2[0123]|[01]?[0-9]) HTTPDATE:%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} INT:(?:[+-]?(?:[0-9]+) ) IP:(?:%{IPV6}|%{IPV4}) IPORHOST:(?:%{HOSTNAME}|%{IP}) IPV4:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) IPV6:((([0-9A-Fa -f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa -f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){ 1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9 A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1 ,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4} :((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9 ]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)? ISO8601_SECOND:(?:%{SECOND}|60) ISO8601_TIMEZONE:(?:Z|[+-]%{HOUR}(?::?%{MINUTE})) LOGLEVEL:([A-a]lert|ALERT| [T|t]race|TRACE|[D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal |FATAL|[S|s]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?) MAC:(?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) MINUTE:(?:[0-5][0-9]) MONTH:\b(?:Jan(?:uary)?|Feb(?:ruary)?|M ar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b MONTHDAY:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9 ]) MONTHNUM:(?:0?[1-9]|1[0-2]) NONNEGINT:\b(?:[0-9]+)\b NOTSPACE:\S+ NUMBER:(?:%{BASE10NUM}) PATH:(?:%{UNIXPATH}|%{WINPATH}) POSINT:\b(?:[1-9][0-9]*)\b PROG:(?:[\w._/% -]+) QS:%{QUOTEDSTRING} QUOTEDSTRING:"([^"\\]*(\\.[^"\\]*)*)"|\'([^\'\\]*(\\.[^\'\\]*)*)\' SECOND:(?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?) SPACE:\s* SYSLOGBASE:%{SYSLOGTI MESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: SYSLOGFACILITY:<%{NONNEGINT:facility}.%{NONNEGINT:priority}> SYSLOGHOST:%{IPORHOST} SY SLOGPROG:%{PROG:program}(?:\[%{POSINT:pid}\])? SYSLOGTIMESTAMP:%{MONTH} +%{MONTHDAY} %{TIME} TIME:([^0-9]?)%{HOUR}:%{MINUTE}(?::%{SECOND})([^0-9]?) TIMESTAMP_ISO8601:% {YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? TTY:(?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+)) TZ:(?:[PMCE][SD]T|UTC|GMT) UNI XPATH:(/[\w_%!$@:.,-]?/?)(\S+)? URI:%{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})? URIHOST:%{IPORHOST}(?::%{POSINT:port})? URIPARAM:\?[A-Za- z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]* URIPATH:(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+ URIPATHPARAM:%{URIPATH}(?:%{URIPARAM})? URIPROTO:[A-Za-z]+(\+[A-Za-z+]+)? USER:%{US ERNAME} USERNAME:[a-zA-Z0-9._-]+ USER_IP_REQUEST:\[(?:%{NOTSPACE:userId})?\/(?:%{NOTSPACE:remoteAddr})?(?:\s-\s%{GREEDYDATA:requestURIWithQueryString})?] UUID:[A-Fa-f0 -9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12} WINDOWSMAC:(?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) WINPATH:([A-Za-z]:|\\)(?:\\[^\\?*]*)+ WORD:\b\w+\b YEAR:(\d\d){1,2}]

childe commented 3 years ago

你的 ERROR_SUMMARY_EXCEPTION 后面多了一个空格。

LittleCadet commented 3 years ago

en , 对,多个空格。。。