GoogleCloudPlatform / fluent-plugin-detect-exceptions

A fluentd plugin that scans line-oriented log streams and combines exceptions stacks into a single log entry.
Apache License 2.0
192 stars 69 forks source link

Cannot figure out configuration issue (go stacktrace) #60

Closed ddides closed 4 years ago

ddides commented 5 years ago

I'm hoping this is an issue with something I'm doing, but I cannot figure out for the life of me what's wrong with my configuration. I've tried a few different fluentd base images and have had no luck.

fluent.conf:

    <source>
        @type tail
        @id in_tail_container_logs
        path "/var/log/containers/*.log"
        pos_file /var/log/es-containers.log.pos
        # pos_file "/var/log/fluentd-containers.log.pos"
        # tag "raw.kubernetes.*"
        tag "raw.kubernetes.*"
        read_from_head true
        <parse>
          @type "json"
          time_format "%Y-%m-%dT%H:%M:%S.%NZ"
          time_type string
        </parse>
      </source>

      # Detect exceptions in the log output and forward them as one log entry.
      <match raw.kubernetes.**>
        @id raw.kubernetes
        @type detect_exceptions
        languages all
        remove_tag_prefix raw
        message log
      </match>

Container logs after starting up:

2019-09-13 21:39:22 +0000 [info]: starting fluentd-1.6.3 pid=1 ruby="2.3.3"
2019-09-13 21:39:22 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/usr/local/bin/fluentd", "--under-supervisor"]
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-concat' version '2.4.0'
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.12'
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '3.5.5'
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.2.0'
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.4.0'
2019-09-13 21:39:23 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2019-09-13 21:39:23 +0000 [info]: gem 'fluentd' version '1.6.3'
2019-09-13 21:39:23 +0000 [info]: adding match pattern="fluent.**" type="null"
2019-09-13 21:39:23 +0000 [info]: adding match pattern="raw.kubernetes.**" type="detect_exceptions"
2019-09-13 21:39:23 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2019-09-13 21:39:24 +0000 [info]: adding match pattern="**" type="elasticsearch"

Log file (Go debug.PrintStack()):

{"log":"2019/09/13 21:56:43 Dumping the stack.\n","stream":"stderr","time":"2019-09-13T21:56:43.607320623Z"}
{"log":"goroutine 1836 [running]:\n","stream":"stderr","time":"2019-09-13T21:56:43.607344068Z"}
{"log":"runtime/debug.Stack(0x5db2ac, 0xc0000ac050, 0x2)\n","stream":"stderr","time":"2019-09-13T21:56:43.607348875Z"}
{"log":"\u0009/usr/local/go/src/runtime/debug/stack.go:24 +0x9d\n","stream":"stderr","time":"2019-09-13T21:56:43.607378653Z"}
{"log":"runtime/debug.PrintStack()\n","stream":"stderr","time":"2019-09-13T21:56:43.607384027Z"}
{"log":"\u0009/usr/local/go/src/runtime/debug/stack.go:16 +0x22\n","stream":"stderr","time":"2019-09-13T21:56:43.607388312Z"}
{"log":"main.debugHandler(0x78f320, 0xc00013e000, 0xc0003f4200)\n","stream":"stderr","time":"2019-09-13T21:56:43.607392715Z"}
{"log":"\u0009/app/main.go:95 +0x117\n","stream":"stderr","time":"2019-09-13T21:56:43.607397018Z"}
{"log":"net/http.HandlerFunc.ServeHTTP(0x73bf68, 0x78f320, 0xc00013e000, 0xc0003f4200)\n","stream":"stderr","time":"2019-09-13T21:56:43.607401473Z"}
{"log":"\u0009/usr/local/go/src/net/http/server.go:1995 +0x44\n","stream":"stderr","time":"2019-09-13T21:56:43.607405934Z"}
{"log":"github.com/gorilla/mux.(*Router).ServeHTTP(0xc0000fc000, 0x78f320, 0xc00013e000, 0xc0003f4000)\n","stream":"stderr","time":"2019-09-13T21:56:43.60741038Z"}
{"log":"\u0009/go/pkg/mod/github.com/gorilla/mux@v1.7.3/mux.go:212 +0xe3\n","stream":"stderr","time":"2019-09-13T21:56:43.607414887Z"}
{"log":"net/http.(*ServeMux).ServeHTTP(0x98bb60, 0x78f320, 0xc00013e000, 0xc0003f4000)\n","stream":"stderr","time":"2019-09-13T21:56:43.607435093Z"}
{"log":"\u0009/usr/local/go/src/net/http/server.go:2375 +0x1d6\n","stream":"stderr","time":"2019-09-13T21:56:43.607453782Z"}
{"log":"net/http.serverHandler.ServeHTTP(0xc000094b60, 0x78f320, 0xc00013e000, 0xc0003f4000)\n","stream":"stderr","time":"2019-09-13T21:56:43.60745936Z"}
{"log":"\u0009/usr/local/go/src/net/http/server.go:2774 +0xa8\n","stream":"stderr","time":"2019-09-13T21:56:43.607462925Z"}
{"log":"net/http.(*conn).serve(0xc0000a6280, 0x78f9a0, 0xc000324000)\n","stream":"stderr","time":"2019-09-13T21:56:43.607466488Z"}
{"log":"\u0009/usr/local/go/src/net/http/server.go:1878 +0x851\n","stream":"stderr","time":"2019-09-13T21:56:43.607469955Z"}
{"log":"created by net/http.(*Server).Serve\n","stream":"stderr","time":"2019-09-13T21:56:43.607473435Z"}
{"log":"\u0009/usr/local/go/src/net/http/server.go:2884 +0x2f4\n","stream":"stderr","time":"2019-09-13T21:56:43.607476898Z"}
{"log":"2019/09/13 21:56:43 Request received: favicon.ico\n","stream":"stderr","time":"2019-09-13T21:56:43.94833136Z"}

For good measure, I also tried starting up a container running a Python container and generating an error with that to see if it was an issue with the Go patterns. I had the same results:

{"log":"Traceback (most recent call last):\r\n","stream":"stdout","time":"2019-09-13T21:53:25.983143026Z"}
{"log":"  File \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\r\n","stream":"stdout","time":"2019-09-13T21:53:25.983167688Z"}
{"log":"TypeError: object of type 'NoneType' has no len()\r\n","stream":"stdout","time":"2019-09-13T21:53:25.983561083Z"}
{"log":"\u003e\u003e\u003e with open('test')\u0008\u001b[K\r\n","stream":"stdout","time":"2019-09-13T21:53:35.53158531Z"}
{"log":"KeyboardInterrupt\r\n","stream":"stdout","time":"2019-09-13T21:53:35.531647904Z"}
{"log":"\u003e\u003e\u003e exit9)\r\n","stream":"stdout","time":"2019-09-13T21:53:37.767546882Z"}
{"log":"  File \"\u003cstdin\u003e\", line 1\r\n","stream":"stdout","time":"2019-09-13T21:53:37.767601662Z"}
{"log":"    exit9)\r\n","stream":"stdout","time":"2019-09-13T21:53:37.767612274Z"}
{"log":"         ^\r\n","stream":"stdout","time":"2019-09-13T21:53:37.76762005Z"}
{"log":"SyntaxError: invalid syntax\r\n","stream":"stdout","time":"2019-09-13T21:53:37.767630672Z"}

I'm not sure what I'm doing wrong here, so any guidance (configuration or even just how to debug a fluentd plugin) would be super appreciated! It looks like the plugin is being called though as it removes the raw tag successfully.

ddides commented 5 years ago

Was thinking about this a little more and thought it was worth mentioning -- the logs are being forwarded to Elasticsearch ultimately and viewed on Kibana, and that's where I'm seeing that they are individual lines as well.

taraspos commented 4 years ago

@ddides the plugin expects proper Golang panic (exception/crash), which looks like:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0xffffffff addr=0x0 pc=0xe01da]

goroutine 1 [running]:
main.main()
    /tmp/sandbox382288397/prog.go:10 +0x1a

playground example

you can find the defined pattern here: https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/blob/master/lib/fluent/plugin/exception_detector.rb#L97-L98

In your case, this is programmatically generated StackTrace debug.PrintStack(), which is not and panic.

igorpeshansky commented 4 years ago

Thanks, @Trane9991! That is absolutely correct. Sounds like WAI.

ddides commented 4 years ago

Thanks for the information! Sorry I missed this response back in March.