GoogleCloudPlatform / fluent-plugin-detect-exceptions

A fluentd plugin that scans line-oriented log streams and combines exceptions stacks into a single log entry.
Apache License 2.0
192 stars 69 forks source link

python Traceback rule separates the Traceback from the rest of the log message #56

Closed dave00galloway closed 5 years ago

dave00galloway commented 5 years ago

problem

the python multiline rule in https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/blob/master/lib/fluent/plugin/exception_detector.rb means that the Traceback is logged separately from the log message it relates to e.g.

[2019-03-14 09:20:44,381] ERROR: 'NoneType' object has no attribute 'options

Traceback (most recent call last):

File "redacted", line 72, in <module>

  my_var = parent.none_object.options.option 

AttributeError: 'NoneType' object has no attribute 'options'`

is treated as 2 separate entries:-

[2019-03-14 09:20:44,381] ERROR: 'NoneType' object has no attribute 'options'

Traceback (most recent call last):

File "redacted", line 72, in <module>

  my_var = parent.none_object.options.option

AttributeError: 'NoneType' object has no attribute 'options'

we have thousands of log entries per second so the Traceback can end up being very separate from the context.

the log entry is generated from the standard python logging library in python 2.7.12

    try:

        my_var = parent.none_object.options.option

    except Exception as e:

        logger.exception(e.message)

proposed solution The rule should be amended to either look ahead from the context log entry for any Tracebacks on the same or next line (preferred), or for Traceback entries to look backwards (yuck)

regex is not my strong point but something like:-

PYTHON_RULES = [
      rule(:start_state, /(?=Traceback \(most recent call last\):)/, :python),
      rule(:python, /^[\t ]+File /, :python_code),
      rule(:python_code, /[^\t ]/, :python),
      rule(:python, /^(?:[^\s.():]+\.)*[^\s.():]+:/, :start_state)
    ].freeze

should work https://regex101.com/r/U5rfWT/1

qingling128 commented 5 years ago

This plugin is designed to be consistent with what Error Reporting backend supports. Typically only standard library exception format is supported. I just tested the snippet above. The result looks as expected though.

$ cat test.py
try:
    my_var = parent.none_object.options.option
except Exception as e:
    logger.exception(e.message)

$ python test.py
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    logger.exception(e.message)
NameError: name 'logger' is not defined

$ python --version
Python 2.7.16

Treating the log lines above as two separate entries is by design. The proposed regex or anything similar would be too loose. e.g.

image

or

image

Closing this as it's working as intended.

christidis commented 5 years ago

@qingling128 I have noticed the same thing with Java stacktraces (2 events one with the first line and a second with the actual stacktrace)? Is this also expected? I am on 0.0.11.

qingling128 commented 5 years ago

@christidis What's your example like?

christidis commented 5 years ago

Sure. There is one event with the first line of a WARN message:

|   | 2019-10-02 07:54:56.180  WARN 1 --- [ask-scheduler-3] o.s.i.mail.ImapIdleChannelAdapter        : Failed to execute IDLE task. Will attempt to resubmit in 10000 milliseconds
-- | -- | --

and another one with the actual stacktrace:

java.lang.IllegalStateException: Failure in 'idle' task. Will resubmit.
    at org.springframework.integration.mail.ImapIdleChannelAdapter$IdleTask.run(ImapIdleChannelAdapter.java:305) ~[spring-integration-mail-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.integration.mail.ImapIdleChannelAdapter$ReceivingTask.run(ImapIdleChannelAdapter.java:254) ~[spring-integration-mail-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.1.5.RELEASE.jar:5.1.5.RELEASE]
    at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93) ~[spring-context-5.1.5.RELEASE.jar:5.1.5.RELEASE]
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
    at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: javax.mail.AuthenticationFailedException: [AUTHENTICATIONFAILED] Invalid credentials (Failure)
    at com.sun.mail.imap.IMAPStore.protocolConnect(IMAPStore.java:661) ~[mail-1.5.0-b01.jar:1.5.0-b01]
    at javax.mail.Service.connect(Service.java:295) ~[mail-1.5.0-b01.jar:1.5.0-b01]
    at javax.mail.Service.connect(Service.java:176) ~[mail-1.5.0-b01.jar:1.5.0-b01]
    at javax.mail.Service.connect(Service.java:125) ~[mail-1.5.0-b01.jar:1.5.0-b01]
    at org.springframework.integration.mail.AbstractMailReceiver.connectStoreIfNecessary(AbstractMailReceiver.java:311) ~[spring-integration-mail-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.integration.mail.AbstractMailReceiver.openFolder(AbstractMailReceiver.java:318) ~[spring-integration-mail-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.integration.mail.ImapMailReceiver.waitForNewMessages(ImapMailReceiver.java:170) ~[spring-integration-mail-5.1.3.RELEASE.jar:5.1.3.RELEASE]
    at org.springframework.integration.mail.ImapIdleChannelAdapter$IdleTask.run(ImapIdleChannelAdapter.java:289) ~[spring-integration-mail-5.1.3.RELEASE.jar:5.1.3.RELEASE]

I was expecting to have them both in the same event.

qingling128 commented 5 years ago

Yes, this is the current expected behavior.

What's the impact of this issue? We might be able to accept a feature request. Yet it needs to be prioritized against other tasks.

christidis commented 5 years ago

I see, thanks for the clarification.

In my case, and I am pretty sure I am not alone, I am applying a few additional extra fluentd parser grok-like filters to the log field, for example, loglevel, class, thread, message and so on.

When one filters messages with loglevel:ERROR for example in Kibana the stack trace is being missed. In addition queries like loglevel:ERROR AND message:*OutOfMemory* do not work since the exception is not there.

For me, it would be better to have the whole Stack Trace together with the Error log or at least provide an option to do so, because the user experience is affected. I can open a feature request if you want.