aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
461 stars 134 forks source link

[Bug] Cannot register file error #796

Open axot opened 8 months ago

axot commented 8 months ago

Describe the question/issue

When the rate at which logs are ingested per second is elevated, the system produce an error stating "cannot register file."

Configuration

The customer tried to have fluent retrieve about 3000 logs per second. It is unable to retrieve all the logs. 3000 is the number of logs that fluent retrieved, but actually more than 3000 logs are sent by application.

Fluent-bit was deployed by built-in feature(aws-logging configmap) of EKS Fargate.

The logs were set to send to both kinesis firehose and cloudwatch logs, and the number of logs matched.

filters.conf:
----
[FILTER]
    Name     parser
    Match    kube.* Key_name log
    Parser   crio
    Reserve_Data On
    [FILTER]
    Name        kubernetes
    Match       kube.*
    Merge_Log   On
    Merge_Log_Key       log_data
    Buffer_Size 0
    Kube_Meta_Cache_TTL 300s
[FILTER]
    Name  rewrite_tag
    Match kube.*
    Rule  $log_data['logger'] ^(search)$ search true
[FILTER]
    Name    grep
    Match   *
    Exclude $kubernetes['container_name'] envoy
[FILTER]
    Name    grep
    Match   *
    Exclude $kubernetes['container_name'] xray-daemon

flb_log_cw:
----
true
output.conf:
----
[OUTPUT]
    Name      cloudwatch
    Match     kube.*
    region    ap-northeast-1
    log_group_name    *****
    log_stream_prefix from-fluent-bit-
    auto_create_group true
[OUTPUT]
    Name    kinesis_firehose
    Match   kube.*
    region  ap-northeast-1
    delivery_stream *****
[OUTPUT]
    Name    kinesis_firehose
    Match   search
    region  ap-northeast-1
    delivery_stream *****
parsers.conf:
----
[PARSER]
    Namecrio
    Format      Regex
    Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>P|F) (?<log>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%LZ

Fluent Bit Log Output

{
        "@timestamp": "2024-03-14 08:40:51.627",
        "@message": {
            "log": "[2024/03/14 08:40:51] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory"
        },
        "@logStream": "from-fluent-bit-**********",
        "@log": "-**********","
    },
    {
        "@timestamp": "2024-03-14 08:40:51.627",
        "@message": {
            "log": "[2024/03/14 08:40:51] [error] [input:tail:tail.0] inode=1836081 cannot register file /var/log/containers/-**********",.log"
        },
        "@logStream": "from-fluent-bit--**********",",
        "@log": "-**********","
    },

Fluent Bit Version Info

Fluent Bit v1.9.10(eks on fargate built-in fluentbit) EKS version 1.27, 1.28, 1.29

Cluster Details

VPC is unlimited outbound, inbound is focused on specific ip and sg.

Use Appmesh Using EKS with Fargate Incorporate Fluent Bit into Fargate

Application Details

Logs are no longer recoverable past 3000 per second. Roughly 6 MB per second.

Related Issues

Not sure if this issue related to EKS AMI update with 1024 NOFILE https://github.com/awslabs/amazon-eks-ami/pull/1535

nooperpudd commented 7 months ago

same issue: [2024/04/07 13:14:08] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory [2024/04/07 13:14:08] [error] [input:tail:tail.1] inode=20972144 cannot register file /var/log/pods/amazon-cloudwatch_fluent-bit-4fb7m_256061af-86f8-48eb-b45e-d3a5d2190006/fluent-bit/0.log (deleted) eks version: 1.29 plugin version: amazon-cloudwatch-observability v1.4.0-eksbuild.1 Fluent Bit: aws-for-fluent-bit:2.32.0.20240304

chuanAlloy commented 4 months ago

Same issue:

[2024/07/17 14:03:56] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory [2024/07/17 14:03:56] [error] [input:tail:tail.0] inode=76569335 cannot register file /var/log/pods/devops-ops_fluentbit-devops-ops-aws-for-fluent-bit-t589r_fbaef6e3-eb19-4599-a11a-cf82da3e9be7/aws-for-fluent-bit/0.log (deleted)

EKS: 1.24 Fluentbit: public.ecr.aws/aws-observability/aws-for-fluent-bit:2.32.2.20240516

joebowbeer commented 4 months ago

Source code for this "No such file or directory" error:

https://github.com/fluent/fluent-bit/blob/574a69af744535b6e016965f02eef9f739a5df1e/plugins/in_tail/tail_fs_inotify.c#L147

NOTE that the in_tail plugin code included with aws-for-fluent-bit (fluentbit v1.9) is 2 or 3 years old.

The in_tail code in fluentbit v2 and v3 has seen a lot of changes, but even so it may not be issue free:

https://github.com/fluent/fluent-bit/issues/2110