cosmo0920 / fluent-bit-go-s3

[Deprecated] The predessor of fluent-bit output plugin for Amazon S3. https://aws.amazon.com/s3/
Apache License 2.0
33 stars 13 forks source link

Fixed to put an existing S3 object. #27

Closed k-kinzal closed 4 years ago

k-kinzal commented 4 years ago

I added a trace log and checked it, and then put it to the same object key is happening.

time="2020-03-28T01:05:24+09:00" level=trace msg="[s3operator] objectKey = fluent-bit/dt=2020-03-28/20200328010524.log, rows = 17, byte = 26436"
time="2020-03-28T01:05:24+09:00" level=trace msg="[s3operator] objectKey = fluent-bit/dt=2020-03-28/20200328010524.log, rows = 61, byte = 118943"
time="2020-03-28T01:05:24+09:00" level=trace msg="[s3operator] objectKey = fluent-bit/dt=2020-03-28/20200328010524.log, rows = 30, byte = 55433"

This problem causes some logs to be lost.

So I added a UUID to the file name suffix. I decided to use UUID because I thought it was the most secure of the following options.

  1. Existence check-in S3
    • This doesn't cause duplication, but it does slow down throughput
  2. Add a suffix of count number
    • It works in one process but duplicates in multiple processes
  3. Add a suffix of sha256
    • Duplicates are rare but the character length is very long
  4. Add a suffix of UUID
    • Duplicates are rare and do not slow down throughput as much as S3 existence checks

Update: Allow configuration parameters to select sha256 or no suffix. This will prevent object key collisions.

[OUTPUT]
    Name            s3
    Match           kube.*
    Bucket          mybuckets
    S3Prefix        path/to
    SuffixAlgorithm sha256
    Region          ap-northeast-1
    Compress        gzip

If SuffixAlgorithm is empty or omitted, no suffix will be given. The default is empty.

Test cases

I have checked to work with EKS v1.14 and fluent-bit v1.3.11.

time="2020-03-28T19:14:18+09:00" level=trace msg="[s3operator] objectKey = fluent-bit/dt=2020-03-28/20200328191418-e97b0092433e9da22e1631e332696e12b367f86416e97fdbbfc2518274c87a06.log, rows = 38, byte = 41308"
time="2020-03-28T19:14:18+09:00" level=trace msg="[s3operator] objectKey = fluent-bit/dt=2020-03-28/20200328191418-be30d54b62a32bf08562c4e6d2fd0505a9a8a242d142c18776b6811ce0e1d559.log, rows = 2, byte = 930"
time="2020-03-28T19:14:18+09:00" level=trace msg="[s3operator] objectKey = fluent-bit/dt=2020-03-28/20200328191418-d786a9e7fb87bb7c819d1c45e8a9ff5b1d8c02de565062da6d315f6b7d061fa5.log, rows = 2, byte = 937"
cosmo0920 commented 4 years ago
  1. Existence check-in S3
    • This doesn't cause duplication, but it does slow down throughput

This configuration pattern will be needed in the future. But it is not needed now.

Like as fluent-plugin-s3: https://github.com/fluent/fluent-plugin-s3/blob/master/lib/fluent/plugin/out_s3.rb#L263-L293

k-kinzal commented 4 years ago

@cosmo0920 Thank you for checking the PR.

Can we provide configuration parameter which provides sha256 suffix or nothing suffix (previous behavior)?

Oh! That's a very good idea! I will change this PR to use configuration.

This configuration pattern will be needed in the future. But it is not needed now.

Yes, I think this feature is necessary too. However, it's hard to implement this feature safely, so for now I'd like to implement it in a way that adds a suffix to the object.

cosmo0920 commented 4 years ago

This configuration pattern will be needed in the future. But it is not needed now.

Yes, I think this feature is necessary too. However, it's hard to implement this feature safely, so for now I'd like to implement it in a way that adds a suffix to the object.

I see. It's reasonable for me.

cosmo0920 commented 4 years ago

Thanks for the update. SuffixAlgorithm seems to be able to provide uuid case....

k-kinzal commented 4 years ago

@cosmo0920

SuffixAlgorithm seems to be able to provide uuid case....

Sorry, I don't understand the intent of this comment. I thought SuffixAlgorithm only supported sha256 or no suffix, but the uuid code was still there?

cosmo0920 commented 4 years ago

SuffixAlgorithm seems to be able to provide uuid case....

Sorry, I don't understand the intent of this comment. I thought SuffixAlgorithm only supported sha256 or no suffix, but the uuid code was still there?

I mean that adding uuid suffix was also preventing way to filename collisions as of replied to your comment. But now, adding suffixes is selectable. I'm got rethinking about it. As a result, adding uuid mechanism will be redundant for me....

k-kinzal commented 4 years ago

@cosmo0920 OK! I understood.

adding uuid mechanism will be redundant for me....

Yes, I also think sha256 prevents collisions, so no uuid mechanism is a good thing.

k-kinzal commented 4 years ago

Updated the code and PR description to support sha256 in the SuffixAlgorithm parameter.