SumoLogic / fluentd-output-sumologic

Fluentd output plugin to deliver logs or metrics to Sumo Logic.
https://rubygems.org/gems/fluent-plugin-sumologic_output
Apache License 2.0
29 stars 28 forks source link

Fluentd stops sending logs when log_format is set to 'text' #85

Open danhale-git opened 1 year ago

danhale-git commented 1 year ago

Edit: Found the issue: I had failed to set log_key after the key changed from the default message to a non default value.

I'm going to leave this open because I think there are some improvements which could have clarified this issue. Unfortunately I have neither the time nor the Ruby experience to open a PR right now.

I appreciate my suggestions may not be valid for this code. Feel free to close this as my issue related to misconfiguration and has been resolved.


I'm using the kube-logging operator: https://kube-logging.dev/docs/configuration/plugins/outputs/sumologic/ This issue arose as we updated from version 3.17 to 4.20 of the operator. There are various sumologic related changes in that release, visible by searching in on the page of this very large diff. https://github.com/kube-logging/logging-operator/compare/release-3.17...4.2.0

However I've come here because the behaviour I'm getting from fluentd based on a single change to the Sumologic output configuration seems unexpected. I hope to get advice on how to move forward with this issue as it seems like a silent failure.

Feel free to close this issue or advise accordingly, I have not confirmed a bug although it does seem like there might be one.

The Sumologic Output is working as expected with log_format: json however if I set log_format: text fluentd stops sending logs.

<source>
  @type forward
  @id main_forward
  bind 0.0.0.0
  port 24240
</source>
<match **>
  @type label_router
  @id main
  metrics true
  <route>
    @label @c1157f02c8c13fd3ea66f8419567c357
    metrics_labels {"id":"flow:mynamespace:my-sumo-flow"}
    <match>
      labels my-sumo-label:enabled
      namespaces mynamespace
      negate false
    </match>
  </route>
  ... ...
</match>
<label @c1157f02c8c13fd3ea66f8419567c357>
  <match kubernetes.**>
    @type tag_normaliser
    @id flow:mynamespace:my-sumo-flow:0
    format ${namespace_name}.${labels.environment}.${pod_name}.${container_name}
  </match>
  <match **>
    @type sumologic
    @id flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output
    endpoint COLLECTOR_URL_REMOVED
    log_format json
    source_name my-sumo-source
    <buffer tag,time>
      @type file
      chunk_limit_size 32m
      flush_interval 60s
      flush_mode interval
      path /buffers/flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output.*.buffer
      retry_forever true
      timekey 10m
      timekey_wait 1m
      total_limit_size 1024m
    </buffer>
  </match>
</label>
... ...

With log_format: json I see fluend logging sends and I see logs in Sumologic.

$ tail -f -n 100000 /fluentd/log/out | grep "mynamespace:my-sumo"
2023-06-05 14:53:43 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Sending 2; logs records with source category '', source host '', source name 'my-sumo-source', chunk #5fd630dfe92e8520c65b0b2aedb3524c, try 0, batch 0
2023-06-05 14:58:21 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd6322428e95cb3b7fbd07ad24428b0" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:59:22 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Sending 13; logs records with source category '', source host '', source name 'my-sumo-source', chunk #5fd6322428e95cb3b7fbd07ad24428b0, try 0, batch 0
2023-06-05 14:59:26 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd632622621c3c2585f88717d2c406e" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>

With log_format: text I no longer see fluentd logging sends and I no longer see logs in Sumologic.

$ tail -f -n 100000 /fluentd/log/out | grep "mynamespace:my-sumo"
2023-06-05 14:41:11 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd62e4de0a29e3c6c8d40ce47a99521" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976000, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:42:16 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd62e8bdd2eb0db408da82be8213a70" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976000, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:43:21 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd62ec9da55d6d9601586c71c835446" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976000, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:50:00 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd63046447c860381363c4d438ee78d" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.container-namer-6fd6b8b55c-gmdsf.container-namer", variables=nil, seq=0>
2023-06-05 14:50:51 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd6307703ab3dd92bbfb4c8b3a7dfc4" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>

The only change happening in the configuration is the value of log_format.

I do see buffers under /buffers.

The logging operator deploys three fluentd containers, I am monitoring and inspecting all of them at once when troubleshooting.

I am using the fluentd debug container and everything seems to be functioning as expected except this one puzzling issue. Any advice is very much appreciated.

danhale-git commented 1 year ago

With log level set to trace:

log_format: json

2023-06-07 06:05:01 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:01 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:06 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:06 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:07 +0000 [trace]: #0 [... ...] writing events into buffer instance=1960 metadata_size=1
2023-06-07 06:05:07 +0000 [trace]: #0 [... ...] chunk /buffers/... ....b5fd83e76014f1e19b59dfa9d48cb2ffc.buffer size_added: 3137 new_size: 21959
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] enqueueing chunk instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] dequeueing a chunk instance=1960
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] chunk dequeued instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] trying flush for a chunk chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] adding write count instance=3180
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] executing sync write chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [debug]: #0 [... ...] Sending 7; logs records with source category '', source host '', source name 'my-source-category, chunk #5fd83e76014f1e19b59dfa9d48cb2ffc, try 0, batch 0
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] write operation done, committing chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] committing write operation to a chunk chunk="5fd83e76014f1e19b59dfa9d48cb2ffc" delayed=false
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] purging a chunk instance=1960 chunk_id="5fd83e76014f1e19b59dfa9d48cb2ffc" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] chunk purged instance=1960 chunk_id="5fd83e76014f1e19b59dfa9d48cb2ffc" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] done to commit a chunk chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] writing events into buffer instance=1960 metadata_size=1
2023-06-07 06:05:12 +0000 [debug]: #0 [... ...] Created new chunk chunk_id="5fd83eb3fe6c1581711a208220c1df91" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] chunk /buffers/... ....b5fd83eb3fe6c1581711a208220c1df91.buffer size_added: 3137 new_size: 3137
2023-06-07 06:05:17 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:17 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:22 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:22 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:28 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:28 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960

log_format: text

2023-06-07 05:49:09 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:14 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:14 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:20 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:20 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:25 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:25 +0000 [trace]: #0 [... ...] enqueueing chunk instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:25 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] dequeueing a chunk instance=1960
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] chunk dequeued instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] trying flush for a chunk chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] adding write count instance=4920
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] executing sync write chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] write operation done, committing chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] committing write operation to a chunk chunk="5fd83aefd47052657cf608a961bbfcd8" delayed=false
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] purging a chunk instance=1960 chunk_id="5fd83aefd47052657cf608a961bbfcd8" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] chunk purged instance=1960 chunk_id="5fd83aefd47052657cf608a961bbfcd8" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] done to commit a chunk chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:30 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:30 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:36 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:36 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:41 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:41 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:47 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
sumo-drosiek commented 1 year ago

@danhale-git Thank you for the issue. I prepared PR which addresses non existing log key. I decided to add warning message as raising exception may lead to re-ingest logs in current architecture

https://github.com/SumoLogic/fluentd-output-sumologic/pull/86