s3_key_format $INDEX increments per output definition, not per tag/stream of logs

SoamA commented 1 year ago

### Describe the question/issue When the S3 output plugin in FluentBit is configured to use the INDEX feature and it's uploading two files to S3 at the same time from the same host, this is what can happen (roughly paraphrasing FB log output): ``` Successfully uploaded object file1_1 Successfully uploaded object file1_2 Successfully uploaded object file2_3 Successfully uploaded object file1_4 Successfully uploaded object file2_5 ``` As a result, `file1` is uploaded to S3 as `file1_1, file1_2, file1_4` and `file2` is uploaded to S3 as `file_3, file_5` - in other words, the index for the s3 fragment for each file is not increasing by one. We cannot even assume it's going to start at 1 for each file. Is there a way of modifying this behavior so that the index is guaranteed to increase by one for each file being uploaded? The reason I ask is because we have services consuming these files in S3 (such as the Spark History Server) that expect such an incremental increase (sequence increasing by 1) embedded in the filenames of the files they are consuming. If this behavior is not present, they will throw an error assuming there's a fragment missing. ### Configuration Using AWS For Fluent Bit 2.31.11. ``` [OUTPUT] Name s3 Match sel.* region us-east-1 bucket mybucket total_file_size 10M s3_key_format /spark-event-logs/mycluster}/eventlog_v2_$TAG[1]/events_$INDEX_$TAG[1]_$UUID s3_key_format_tag_delimiters .. store_dir /home/ec2-user/buffer upload_timeout 7m log_key log ``` * Fluent Bit Configuration File The full config file is contained in the `fluent-bit-crash-repro.tar` provided as part of our discussions in https://github.com/aws/aws-for-fluent-bit/issues/661 * Full Config Map and pod configuration Can provide if necessary but again, same as what was described in https://github.com/aws/aws-for-fluent-bit/issues/661 ### Fluent Bit Log Output

Fluent Bit Version Info

Cluster Details

Application Details

Steps to reproduce issue

Use S3 plugin output. Upload pattern for OUTPUT has to include INDEX. Use Fluent Bit to upload two files simultaneously that match this pattern.

Related Issues

PettitWesley commented 1 year ago

@SoamA Sorry I'm not completely understanding this... can you explain it with an example s3_key_prefix that has $INDEX and the full S3 key names and tag values. And explain the file fragment part some more, thanks!

Is the issue that for each value of $TAG, you want sequential indexes?

I should note this bug: https://github.com/aws/aws-for-fluent-bit/issues/653

SoamA commented 1 year ago

Yes, here's the fluent bit config:

      s3_key_format                   /spark-event-logs/mycluster}/eventlog_v2_$TAG[1]/events_$INDEX_$TAG[1]_$UUID

Note that we're relying on the INDEX feature to embed a numerically increasing sequence in the filenames. Here's what the Fluent Bit upload log looks like:

2023-06-07T18:03:44.102855211Z stderr F [2023/06/07 18:03:44] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/events_137_spark-3cc4886822bd405c80b6a16718547ad4_ys1Iws3p
2023-06-07T18:10:46.119749737Z stderr F [2023/06/07 18:10:46] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/events_138_spark-3cc4886822bd405c80b6a16718547ad4_69gehrN4
2023-06-07T18:14:04.134622779Z stderr F [2023/06/07 18:14:04] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/events_139_spark-3cc4886822bd405c80b6a16718547ad4_q2UV8ypa
2023-06-07T18:15:02.838638712Z stderr F [2023/06/07 18:15:02] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-efd980675cd84f99814cd5ce20c9f17b/events_140_spark-efd980675cd84f99814cd5ce20c9f17b_Xdzd0OL5
2023-06-07T18:15:19.086118196Z stderr F [2023/06/07 18:15:19] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/events_141_spark-3cc4886822bd405c80b6a16718547ad4_f8W0jfZO
2023-06-07T18:23:02.915445149Z stderr F [2023/06/07 18:23:02] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/events_142_spark-3cc4886822bd405c80b6a16718547ad4_9cgENQzg
2023-06-07T18:32:02.955507888Z stderr F [2023/06/07 18:32:02] [ info] [output:s3:s3.3] Successfully uploaded object /spark-event-logs/adhoc/eventlog_v2_spark-efd980675cd84f99814cd5ce20c9f17b/events_143_spark-efd980675cd84f99814cd5ce20c9f17b_5RdoPafS

In the target S3 bucket, this produces the following:

s3://mybucket/spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/:
   events_137_spark-3cc4886822bd405c80b6a16718547ad4_ys1Iws3p
   events_138_spark-3cc4886822bd405c80b6a16718547ad4_69gehrN4
   events_139_spark-3cc4886822bd405c80b6a16718547ad4_q2UV8ypa
   events_141_spark-3cc4886822bd405c80b6a16718547ad4_f8W0jfZO
   events_142_spark-3cc4886822bd405c80b6a16718547ad4_9cgENQzg

and

s3://mybucket/spark-event-logs/spark-event-logs/adhoc/eventlog_v2_spark-efd980675cd84f99814cd5ce20c9f17b/:
   events_140_spark-efd980675cd84f99814cd5ce20c9f17b_Xdzd0OL5
   events_143_spark-efd980675cd84f99814cd5ce20c9f17b_5RdoPafS

This is not desirable since for the first case, there's a jump from 139 to 141 and for the second, there's a jump from 140 to 143. What we really want is:

s3://mybucket/spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/:
   events_001_spark-3cc4886822bd405c80b6a16718547ad4_ys1Iws3p
   events_002_spark-3cc4886822bd405c80b6a16718547ad4_69gehrN4
   events_003_spark-3cc4886822bd405c80b6a16718547ad4_q2UV8ypa
   events_004_spark-3cc4886822bd405c80b6a16718547ad4_f8W0jfZO
   events_005_spark-3cc4886822bd405c80b6a16718547ad4_9cgENQzg

and

s3://mybucket/spark-event-logs/spark-event-logs/adhoc/eventlog_v2_spark-efd980675cd84f99814cd5ce20c9f17b/:
   events_001_spark-efd980675cd84f99814cd5ce20c9f17b_Xdzd0OL5
   events_002_spark-efd980675cd84f99814cd5ce20c9f17b_5RdoPafS

i.e each file upload has its own INDEX counter as opposed to having a single counter shared amongst multiple file uploads. It actually doesn't even have to start with 001, it can be any number as long as it's increasing sequentially. So

s3://mybucket/spark-event-logs/adhoc/eventlog_v2_spark-3cc4886822bd405c80b6a16718547ad4/:
   events_137_spark-3cc4886822bd405c80b6a16718547ad4_ys1Iws3p
   events_138_spark-3cc4886822bd405c80b6a16718547ad4_69gehrN4
   events_139_spark-3cc4886822bd405c80b6a16718547ad4_q2UV8ypa
   events_140_spark-3cc4886822bd405c80b6a16718547ad4_f8W0jfZO
   events_141_spark-3cc4886822bd405c80b6a16718547ad4_9cgENQzg

and

s3://mybucket/spark-event-logs/spark-event-logs/adhoc/eventlog_v2_spark-efd980675cd84f99814cd5ce20c9f17b/:
   events_145_spark-efd980675cd84f99814cd5ce20c9f17b_Xdzd0OL5
   events_146_spark-efd980675cd84f99814cd5ce20c9f17b_5RdoPafS

would also work. Let me know if that helps in clarifying the problem.

PettitWesley commented 1 year ago

I think I get it. Are you running on k8s?

You have multiple tags processed by a single S3 output, and the $INDEX numbers should be sequential within a tag/stream of logs. Currently it just increments up in time within the S3 output.

I'll have to take this as a feature request, which I probably won't be able to prioritize soon sorry. @SoamA you can help by submitting a feature request via AWS Support.

For a short term workaround, I wonder if there's some way you could have multiple S3 outputs, one for each tag, so each one has its own $INDEX. Is that possible? How many tags do you have?

Could you do some sort of metadata rewrite_tag scheme to change the tags to be a small set of meaningful values? (I can help with this if you explain your architecture more).

SoamA commented 1 year ago

Hey @PettitWesley,

I think I get it. Are you running on k8s?

Yes, we're on EKS.

You have multiple tags processed by a single S3 output, and the $INDEX numbers should be sequential within a tag/stream of logs. Currently it just increments up in time within the S3 output.

I'll have to take this as a feature request, which I probably won't be able to prioritize soon sorry. @SoamA you can help by submitting a feature request via AWS Support.

Yes, will do. Stay tuned!

For a short term workaround, I wonder if there's some way you could have multiple S3 outputs, one for each tag, so each one has its own $INDEX. Is that possible? How many tags do you have?

Could you do some sort of metadata rewrite_tag scheme to change the tags to be a small set of meaningful values? (I can help with this if you explain your architecture more).

Here's the relevant INPUT part of the fluent-bit conf:

  [INPUT]
      Name                tail
      Tag                 sel.<spark_internal_app_id>
      Path                /var/log/containers/eventlogs/*\.inprogress
      DB                  /var/log/sel_spark.db
      multiline.parser    docker, cri
      Mem_Buf_Limit       10MB
      Skip_Long_Lines     On
      Refresh_Interval    10
      Tag_Regex           (?<spark_internal_app_id>spark-[a-z0-9]+)
      Buffer_Chunk_Size   1MB
      Buffer_Max_Size     5MB

Spark driver processes on an EKS host are configured to output their event logs to /var/log/containers/eventlogs/. Each log is a single file. They look like:

$ ls -al /var/log/containers/eventlogs/
total 106464
drwxrwxrwx 2 root  root     4096 Jun  8 21:18 .
drwxr-xr-x 3 root  root     8192 Jun  8 21:14 ..
-rw-rw---- 1 spark root 41258305 Jun  7 18:30 spark-3cc4886822bd405c80b6a16718547ad4
-rw-rw---- 1 spark root 26285083 Jun  8 21:18 spark-5d6e74c026e44ae594311dd03d2da5bc
-rw-rw---- 1 spark root 41316254 Jun  7 23:16 spark-c1b075c4bf3b491d85e8d2159b141731
-rw-rw---- 1 spark root   135360 Jun  7 18:08 spark-efd980675cd84f99814cd5ce20c9f17b

When the Spark driver process is actively generating the logs, they have an .inprogress suffix. Once the job has completed running, the .inprogress suffix is removed. So in Fluent Bit, TAG[1] matches the 41 char alphanumeric string in the event log file name ( eg. c1b075c4bf3b491d85e8d2159b141731, efd980675cd84f99814cd5ce20c9f17b from the directory listed above). Because this alphanumeric string is randomly generated by Spark, I don't think we could meaningfully limit it to anything smaller, sadly, but I'm open to suggestions!

SoamA commented 1 year ago

Submitted feature request in AWS support ticket https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=12990900451.

aws / aws-for-fluent-bit