awslabs / aws-fluent-plugin-kinesis

Amazon Kinesis output plugin for Fluentd
Apache License 2.0
293 stars 96 forks source link

Secondary formatting #167

Open dtmistry opened 5 years ago

dtmistry commented 5 years ago

I'm trying to use a file/s3 output as a <secondary> if sending events to Kinesis is failing. But the way the plugin formats incoming events makes its harder to read from a secondary file or S3

Original message (json) -

{"test":"this is a test message", "message_id":"1234"}
{"test":"this is a test message", "message_id":"123456"}
{"test":"this is a test message", "message_id":"456789"}

Message in secondary output (file or S3) -

36cbd167849700a041b3bd691b014937íƒì{"test":"this is a test message", "message_id":"1234"}Ÿ c762173994c445b6927e569fa0821e6fíƒî{"test":"this is a test message", "message_id":"123456"}Ÿ 3cb59c12b2209ec3728795c6c58af6abíƒì{"test":"this is a test message", "message_id":"456789"}Ÿ 

This is because of the format method implementation which adds a Hex of the event as the partition key

https://github.com/awslabs/aws-fluent-plugin-kinesis/blob/master/lib/fluent/plugin/out_kinesis_streams.rb#L39

Would it make sense to just format the message with the configured formatter in the format method and calculate the hex in the write method? That way secondary outputs can keep the desired formatting.

Environment -

td-agent3 running in a Ubuntu 14.04 container fluent-plugin-kinesis-2.1.1

config -

<match carting.*>
    # plugin type
    @log_level debug
    @type kinesis_streams

    # your kinesis stream name
    stream_name test_stream
    # aws region
    region us-east-1
    <buffer>
      retry_max_times 3
      flush_interval 10s
      flush_thread_interval 0.1
      flush_thread_burst_interval 0.01
      flush_thread_count 4
    </buffer>
    <format>
        @type json
    </format>
    <secondary>
        @type file
        path /fluentd/log/failed_events
        <format>
            @type json
        </format>
    </secondary>
</match>
yang-wei commented 3 years ago

facing the same issue too with https://docs.fluentd.org/output#secondary-output

where my logs look like

�={"key":"value"}�$3544c5eb-6536-11eb-8db1-0ea64c53eca3

3544c5eb-6536-11eb-8db1-0ea64c53eca3 is partition key