The plugin breaks when trying to process a malformed json

marcosfparreiras commented 11 months ago

Describe the bug

We are using the fluent-plugin-dynatrace along with the one to send the logs to Logz.io (fluent-plugin-logzio). One of our applications, some times generates logs that include malformed json, that we have to see in our logs analysis tool. These logs are successfully processed by the plugin fluent-plugin-logzio and sent to Logz.io, but when being processed by fluent-plugin-dynatrace, an exception is raised and that makes these logs to never be sent to Dynatrace.

To Reproduce

Attempt to send, from the fluent-dynatrace-plugin, the following records to Dynatrace:

records = [
{
"host"=>"10.0.0.X",
"msg"=>"A regular message",
},
{
"host"=>"10.0.0.X",
"msg"=>"V\x92\xA1\u000F\xF0ܱ\u0013,\x82",
} 
]

The second record is a malformed json. That leads to the following error message:

2023-12-14 08:20:32 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2023-12-14 08:20:33 +0000 chunk="<chunk_id>" error_class=JSON::GeneratorError error="source sequence is illegal/malformed utf-8"
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:158:in `to_json'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:158:in `serialize'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:149:in `send_records'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:140:in `block in synchronized_send_records'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:139:in `synchronize'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:139:in `synchronized_send_records'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluent-plugin-dynatrace-0.2.0/lib/fluent/plugin/out_dynatrace.rb:109:in `write'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1225:in `try_flush'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
2023-12-14 08:20:32 +0000 [warn]: #0 /opt/rh/rh-ruby30/root/usr/share/gems/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Expected behaviour

We would expect that also malformed JSON messages are serialised and sent to Dynatrace from the plugin fluent-plugin-dynatrace. These logs are also relevant for us to analyse. Some times, a malformed json could be exactly the root cause of a problem that we are investigation. This behaviour is the one we have with our current logs analysis tool (Logz.io), and we also need that on Dynatrace, given that it will be our primary logs analysis tool in the near future.

Desktop (please complete the following information): OS:

The issue happens on a container (based on image centos:7), running on the following host:

# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"

The issue is reproducible via unit tests also in a macOS 13.5.1

The container has the following versions running:

ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
gem fluent-plugin-beats (1.1.0)
gem fluent-plugin-dynatrace (0.2.0)
gem fluentd (1.16.3)

Additional context The idea to use Yajl to serialize the JSON messages came from the implementation of other fluent plugins, that use that approach, such as:

fluent-plugin-logzio: https://github.com/logzio/fluent-plugin-logzio/blob/master/lib/fluent/plugin/out_logzio_buffered.rb#L109
fluent-plugin-cloudwatch-logs: https://github.com/fluent-plugins-nursery/fluent-plugin-cloudwatch-logs/blob/master/lib/fluent/plugin/out_cloudwatch_logs.rb#L171

marcosfparreiras commented 11 months ago

I opened the PR https://github.com/dynatrace-oss/fluent-plugin-dynatrace/pull/46 as a proposal to fix the bug. The PR contains the unit tests to reproduce the issue in the current latest version (0.2.0), and to show that using Yajl, the problem is solved.

dyladan commented 11 months ago

I looked into your proposed fix and actually the log is still rejected with a 400 by the Dynatrace API. The malformed UTF-8 can only be ingested if you also do Content-Type: text/plain; charset=utf-8, but that causes the JSON not to be parsed at all by Dynatrace.

marcosfparreiras commented 11 months ago

Thank you very much for your comment @dyladan ! Hm, that's odd 🤔 . We are currently using the solution that I suggested above and, with that, we were actually able to flush a buffer that, with the current stable version, could not be flushed due to this malformed json error.

This is the structure of the fluentd config file that we are using currently:

# Input
<source>
  @type forward
  port 24224
</source>

# Input for filebeats
# https://github.com/repeatedly/fluent-plugin-beats
<source>
  @type beats
  metadata_as_tag
  port 25224
</source>

# log targets
<match **>
  @type copy

  # Another destination
  <store ignore_error>
    # Omitted as it's not relevant for the discussed problem
  </store>

  # DynaTrace
  <store ignore_error>
    @type dynatrace
    api_token "#{ENV['LOG_INGEST_TOKEN']}"
    active_gate_url "https://<my_environment>.live.dynatrace.com/api/v2/logs/ingest"
    inject_timestamp   false
    <buffer>
        @type file
        path                /var/log/fluent/dynatrace_buffer
        flush_interval      10s
        chunk_limit_size    1m
    </buffer>
  </store>
</match>

We don't set anything specific for the content type. We really just use the plugin with its default configurations.

Also, from the test case that I added reproducing the error (https://github.com/dynatrace-oss/fluent-plugin-dynatrace/pull/46/files#diff-a4ccca5a58a6e366de8b8589846631701d8f66a1c67ea5b1f98dec49ae27f7a6R234), I could see that with the records.to_json, we got the malformed issue also in the tests, but with Yajl.dump that is solved and we then actually get the parsed record.

Could you share with me how you tested it on your side? I can try to reproduce it and see what's different from our current case 😊

dyladan commented 11 months ago

Sure. This is a simple test case that is rejected with a 400 error. It is exactly the case in the test in your PR

require 'uri'
require 'net/http'
require 'json'

active_gate_url = ENV["ACTIVE_GATE_URL"]
token = ENV["API_TOKEN"]

record = "{\"message\":\"This is a malformed json message V\x92\xA1\u000F\xF0ܱ\u0013,\x82\"}"
res = Net::HTTP.post(
    URI(active_gate_url),
    record,
    {
        "Content-Type" => "application/json; charset=utf-8",
        :Authorization => "Api-Token #{token}"
    }
)

puts res.body
puts res

dyladan commented 5 months ago

@marcosfparreiras did you ever discover what is different from your setup?

dynatrace-oss / fluent-plugin-dynatrace

The plugin breaks when trying to process a malformed json #45