Open axot opened 8 months ago
same issue:
[2024/04/07 13:14:08] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory [2024/04/07 13:14:08] [error] [input:tail:tail.1] inode=20972144 cannot register file /var/log/pods/amazon-cloudwatch_fluent-bit-4fb7m_256061af-86f8-48eb-b45e-d3a5d2190006/fluent-bit/0.log (deleted)
eks version: 1.29
plugin version: amazon-cloudwatch-observability v1.4.0-eksbuild.1
Fluent Bit: aws-for-fluent-bit:2.32.0.20240304
Same issue:
[2024/07/17 14:03:56] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory [2024/07/17 14:03:56] [error] [input:tail:tail.0] inode=76569335 cannot register file /var/log/pods/devops-ops_fluentbit-devops-ops-aws-for-fluent-bit-t589r_fbaef6e3-eb19-4599-a11a-cf82da3e9be7/aws-for-fluent-bit/0.log (deleted)
EKS: 1.24 Fluentbit: public.ecr.aws/aws-observability/aws-for-fluent-bit:2.32.2.20240516
Source code for this "No such file or directory" error:
NOTE that the in_tail
plugin code included with aws-for-fluent-bit (fluentbit v1.9) is 2 or 3 years old.
The in_tail
code in fluentbit v2 and v3 has seen a lot of changes, but even so it may not be issue free:
Describe the question/issue
When the rate at which logs are ingested per second is elevated, the system produce an error stating "cannot register file."
Configuration
The customer tried to have fluent retrieve about 3000 logs per second. It is unable to retrieve all the logs. 3000 is the number of logs that fluent retrieved, but actually more than 3000 logs are sent by application.
Fluent-bit was deployed by built-in feature(aws-logging configmap) of EKS Fargate.
The logs were set to send to both kinesis firehose and cloudwatch logs, and the number of logs matched.
Fluent Bit Log Output
Fluent Bit Version Info
Fluent Bit v1.9.10(eks on fargate built-in fluentbit) EKS version 1.27, 1.28, 1.29
Cluster Details
VPC is unlimited outbound, inbound is focused on specific ip and sg.
Use Appmesh Using EKS with Fargate Incorporate Fluent Bit into Fargate
Application Details
Logs are no longer recoverable past 3000 per second. Roughly 6 MB per second.
Related Issues
Not sure if this issue related to EKS AMI update with 1024 NOFILE https://github.com/awslabs/amazon-eks-ami/pull/1535