fluent / fluent-plugin-s3

Amazon S3 input and output plugin for Fluentd
https://docs.fluentd.org/output/s3
314 stars 217 forks source link

s3_object_key_format without index overwrites (not append) log files #435

Open arepusb opened 9 months ago

arepusb commented 9 months ago

Describe the bug

Hello there! I use fluentd v1.16 and fluent-plugin-s3 and according my configuration it should send logs from few files to MinIO once per day. Hers is the configuration:

<match projectname>
    @type s3
    aws_key_id "#{ENV['MINIO_ROOT_USER']}"
    aws_sec_key  "#{ENV['MINIO_ROOT_PASSWORD']}"
    s3_bucket tenants
    format json
    force_path_style true
    s3_endpoint "http://#{ENV['MINIO_HOST']}:#{ENV['MINIO_PORT']}/"
    path "#{ENV['TENANT_ID']}/logs/projectname-"     # This prefix is added to each file
    time_slice_format %Y%m%d%H%M  # This timestamp is added to each file name
    #s3_object_key_format %{path}%{time_slice}.%{file_extension} # Should be commented because target log file will be overwritten few times and logs wil be lost. 

    <buffer tag,time>
        @type file
        path /fluentd/logs/
        timekey 1440m  
        timekey_wait 10m
        flush_mode lazy
        timekey_use_utc true
        chunk_limit_size 256m
    </buffer>
</match>

I noticed that fluent-plugin-s3 often creates on MinIO more than one file per day.

Example, projectname-202401240532_3.gz projectname-202401240532_2.gz projectname-202401240532_1.gz projectname-202401240532_0.gz

I would like to have single log file on MinIO per day. To achieve this goal I tried to play with s3_object_key_format property. default value = %{path}%{timeslice}%{index}.%{file_extension} I changed it to %{path}%{time_slice}.%{file_extension}. As result I lost part of the logs. Looks like target log file was overwritten few times and I saw only data from the latest iteration.

How to force fluent-plugin-s3 to create only single file on MinIO when timekey condition has been met (and not lose data)?

To Reproduce

Use provided configuration and check the logs on MinIO.

Expected behavior

Would be great if information in the log file will be appended instead overwriting.

Your Environment

- Fluentd version: v1.16
- TD Agent version:
- Operating system:
- Kernel version:

Your Configuration

<match projectname>
    @type s3
    aws_key_id "#{ENV['MINIO_ROOT_USER']}"
    aws_sec_key  "#{ENV['MINIO_ROOT_PASSWORD']}"
    s3_bucket tenants
    format json
    force_path_style true
    s3_endpoint "http://#{ENV['MINIO_HOST']}:#{ENV['MINIO_PORT']}/"
    path "#{ENV['TENANT_ID']}/logs/projectname-"     # This prefix is added to each file
    time_slice_format %Y%m%d%H%M  # This timestamp is added to each file name
    #s3_object_key_format %{path}%{time_slice}.%{file_extension} # Should be commented because target log file will be overwritten few times and logs wil be lost. 

    <buffer tag,time>
        @type file
        path /fluentd/logs/
        timekey 1440m  
        timekey_wait 10m
        flush_mode lazy
        timekey_use_utc true
        chunk_limit_size 256m
    </buffer>
</match>

Your Error Log

have no error log

Additional context

No response

daipom commented 9 months ago

I have moved the issue here because it is about out_s3.

daipom commented 9 months ago

out_s3 plugin does not support append feature. So, we need to upload files without duplicating file names.

I would like to have single log file on MinIO per day.

You can make out_s3 to upload files once a day. (if you can tolerate very slow upload frequency...)

arepusb commented 9 months ago

You can make out_s3 to upload files once a day. (if you can tolerate very slow upload frequency...)

Fluentd is already configured to upload logs to MinIO once per day. I attached config in the description. Here is part of the config

timekey 1440m  
timekey_wait 10m

Anyway I get more than one file per day often. Their time difference is no more than two minutes (much less than 10m).

daipom commented 9 months ago

Hmm, tag key or chunk_limit_size can be the cause.

<buffer tag,time>
chunk_limit_size 256m
arepusb commented 9 months ago

Size of log files ~ 200-300 bytes. It's much less than 256m.

daipom commented 9 months ago

Then, please try removing tag key

- <buffer tag,time>
+ <buffer time>
arepusb commented 9 months ago

Then, please try removing tag key

- <buffer tag,time>
+ <buffer time>

To be honest I tested both variants before creation this request. Thank you for trying to help!

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

daipom commented 7 months ago

@arepusb Sorry for the interval. Did you find out anything?