Allow to ignore events that point to a file on s3 of size 0

pieterjanpintens commented 2 years ago

Is your feature request related to a problem? Please describe.

We try to parse ALB logs from AWS. They are pushed to s3. This is set up by AWS, we don't have any control on this. AWS perform gzip before putting them on s3.

We notice that fluentd sometimes has errors related to these files, complaining that it is not a valid gzipped file. These events go to sqs dlq eventually. When investigating further we saw that these events point to files with size 0. We are not sure why aws puts those files but this is outside our control.

Since the size is in the event (see bellow). Would it be possible add a check to skip these events instead of trying to send them to gzip extractor or can the gzip extractor be improved to work properly in this scenario?

{
    "Records": [
        {
            "eventVersion": "2.1",
            "eventSource": "aws:s3",
            "awsRegion": "eu-west-1",
            "eventTime": "2022-09-23T08:39:05.952Z",
            "eventName": "ObjectCreated:Put",
            "userIdentity": {
                "principalId": "AWS:AIDAIC3Q6OY7XTEX2MMHK"
            },
            "requestParameters": {
                "sourceIPAddress": "2a05:d018:22c:4402:86d0:6622:7122:b94"
            },
            "responseElements": {
                "x-amz-request-id": "296FESB5BDKE2SPZ",
                "x-amz-id-2": "Wi2Bn4ZhWKud+Mw5ust1WtJBQps4gEwls7I7GKkR/O00ZmDW8bY+jDSfnX2Jlh+QDFgFXmQZVuQ37DLo2Sc5SaliTZMYGCnG"
            },
            "s3": {
                "s3SchemaVersion": "1.0",
                "configurationId": "tf-s3-queue-20220524144943449100000001",
                "bucket": {
                    "name": "inventivelogs-eu-west-1-alb-production",
                    "ownerIdentity": {
                        "principalId": "A1U51RDWZIBX10"
                    },
                    "arn": "arn:aws:s3:::inventivelogs-eu-west-1-alb-production"
                },
                "object": {
                    "key": "secp/AWSLogs/460634120503/elasticloadbalancing/eu-west-1/2022/09/23/460634120503_elasticloadbalancing_eu-west-1_app.secp-production.6c278d874a2797c6_20220923T0840Z_54.229.236.72_pihbpegx.log.gz",
                    "size": 0,
                    "eTag": "d41d8cd98f00b204e9800998ecf8427e",
                    "sequencer": "00632D70A9EA2062EA"
                }
            }
        }
    ]
}

Describe the solution you'd like

Either gzip should cope with files of size 0 or we should be able to prevent that these files reach the extractor phase by using some config option like: skip_zero_bytes_files: bool or something like that.

Describe alternatives you've considered

Live with the errors in fluentd

Additional context

No response

pieterjanpintens commented 2 years ago

We also tried the gzip command extractor but that fails too, fixing that one might be easier

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

github-actions[bot] commented 1 year ago

This issue was automatically closed because of stale in 30 days

fluent / fluent-plugin-s3