Open mukshe01 opened 3 months ago
Please follow the issue template to supply all required information including things like version and target env.
Hi Patrick, apologies, i have followed issue template and update the issue, please let me know if you require any info,
There is a similar issue on EKS Fargate https://github.com/aws/aws-for-fluent-bit/issues/796 Could you please check if it is related to NOFILE limit with the following command?
$ kubectl exec -ti REPLACE_WITH_YOUR_FLUENTBIT_POD -- sh -c 'grep files /proc/*/limits; grep -a '\r' /proc/*/cmdline'
Hi , Thank you for your response, below is the output of the command.
/proc/1/limits:Max open files 1048576 1048576 files
/proc/32/limits:Max open files 1048576 1048576 files
/proc/self/limits:Max open files 1048576 1048576 files
/proc/thread-self/limits:Max open files 1048576 1048576 files
/proc/1/cmdline:/fluent-bit/bin/fluent-bit-e/fluent-bit/firehose.so-e/fluent-bit/cloudwatch.so-e/fluent-bit/kinesis.so-c/fluent-bit/etc/fluent-bit.conf
/proc/32/cmdline:sh-cgrep files /proc/*/limits; grep -a r /proc/*/cmdline
/proc/self/cmdline:grep-ar/proc/1/cmdline/proc/32/cmdline/proc/self/cmdline/proc/thread-self/cmdline
/proc/thread-self/cmdline:grep-ar/proc/1/cmdline/proc/32/cmdline/proc/self/cmdline/proc/thread-self/cmdline```
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30446
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
sh-4.2# ulimit -n
1048576
should we adjust anything?
also we increased Mem_Buf_Limit from 5 mb to 50 mb, and we see significant reduction of missing logs in cloudwatch. would you be able to sufggest any improvement in fluent bit config so the missing logs issue wont occur in future.
Regards Shekhar
Thanks for sharing the output. Initially, I suspected it might be related to the NOFile limits, but your results suggest they are sufficient. I'll try to reproduce the issue in my environment. Thanks.
Hi @axot ,
do you have any luck reproducing this issue, please let me know if i need to provide any info.
Regards Shekhar
I'm getting a similar error with Fluent Bit in my environment too.
[2024/07/02 10:54:24] [error] [plugins/in_tail/tail_file.c:1432 errno=2] No such file or directory
[2024/07/02 10:54:24] [error] [plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory
[2024/07/02 10:54:24] [error] [input:tail:tail.0] inode=97518421 cannot register file /var/log/containers/my-nginx-586cfd5d59-9bgqm_default_my-nginx-d7f6466fe5757cc8ff6183b7a764b35dc509ec923f7bd0eee9d52b1b7680a952.log
Information on the environment where the error occurs is as follows.
Version used: Fluent Bit v1.9.10 AWS for Fluent Bit Container Image Version 2.32.2.20240516
Environment name and version kubernates, version: 1.28
Server type and version: aws ec2
Operating System and version: Amazon Linux 2. aws optimized ami, ami id,ami-0e1413630fdbd046e
The steps to reproduce are as follows.
Use the following configuration file to create a cluster with eksctl.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: sample-cluster
region: ap-northeast-1
version: "1.28"
vpc:
id: "< vpc id >"
cidr: "< cidr >"
subnets:
private:
ap-northeast-1a:
id: "< subnet id >"
cidr: "< cidr >"
ap-northeast-1c:
id: "< subnet id >"
cidr: "< cidr >"
managedNodeGroups:
- name: ng-1
instanceType: m5.xlarge
desiredCapacity: 2
privateNetworking: true
Follow the steps below to install Fluent Bit on the EKS cluster created in step 1. Set up Fluent Bit as a DaemonSet to send logs to CloudWatch Logs - Amazon CloudWatch #Setting up Fluent Bit https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html#Container-Insights-FluentBit-setup
The manifest file I used is shown below.
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml
Output plugin have been added and log levels have been changed.
$ diff fluent-bit.yaml backup_fluent-bit.yaml
50c50
< Log_Level info
---
> Log_Level error
116,123d115
<
< [OUTPUT]
< Name firehose
< Match application.*
< region ap-northeast-1
< delivery_stream < The name of the Kinesis Firehose Delivery stream >
< retry_limit 5
Installing the AWS Load Balancer Controller Follow the steps below to install the AWS Load Balancer Controller. https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html
Creating a Docker image Create a Docker image based on the following Dockerfile.
FROM amazonlinux:2023
RUN yum update && \
yum install nginx -y && \
yum clean all
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
CMD ["nginx", "-g", "daemon off;"]
apiVersion: v1
kind: Service
metadata:
name: my-nginx
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: internal
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: LoadBalancer
selector:
run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: < Image URI >
ports:
- containerPort: 8080
resources:
requests:
cpu: 1500m
memory: 1G
limits:
cpu: 1500m
memory: 1G
volumeMounts:
- mountPath: /etc/nginx/nginx.conf
name: nginx-config-vol
subPath: nginx.conf
volumes:
- name: nginx-config-vol
configMap:
name: nginx-config
items:
- key: nginx.conf
path: nginx.conf
---
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-config
data:
nginx.conf: |
worker_rlimit_nofile 30000;
events {
worker_connections 10000;
}
http {
server_tokens off;
client_header_timeout 13s;
keepalive_timeout 350s;
upstream s3-vpce {
server < s3 interface VPC endpoints ipaddress >:80;
server < s3 interface VPC endpoints ipaddress >:80;
server < s3 interface VPC endpoints ipaddress >:80;
}
map $http_host $s3_backet {
default "< s3 bucket name >";
}
log_format upstreamlog '[$time_local] $http_x_forwarded_for $remote_addr $status $host $upstream_addr $upstream_cache_status $upstream_status $request $http_referer $body_bytes_sent $request_time $http_user_agent';
access_log /var/log/nginx/access.log upstreamlog;
error_log /var/log/nginx/error.log notice;
rewrite_log off;
server {
listen 8080;
location / {
rewrite ^/$ /index.html break;
proxy_set_header Host $s3_backet;
proxy_pass http://s3-vpce;
proxy_connect_timeout 10s;
proxy_read_timeout 30s;
}
}
}
$ ab -n 10000000 -c 100 -p post.txt -q http://< Network Load Balancer DNS Name >/index.html
Bug Report
Describe the bug We are running fluentbit to push application logs from our kubernates cluster(eks cluster with ec2 machines as k8s nodes) to cloudwatch, recently we observed some log entries are missing in cloudwatch when system is on high load.
To Reproduce
also many occurances of this(our mem buffer config is Mem_Buf_Limit) when system is on high load:
2024-03-20T13:29:12.624465969Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit) 2024-03-20T13:29:12.915368764Z stderr F [2024/03/20 13:29:12] [ info] [input] tail.0 resume (mem buf overlimit) 2024-03-20T13:29:12.923306843Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit) 2024-03-20T13:29:12.954591621Z stderr F [2024/03/20 13:29:12] [ info] [input] tail.0 resume (mem buf overlimit) 2024-03-20T13:29:12.956495689Z stderr F [2024/03/20 13:29:12] [ warn] [input] tail.0 paused (mem buf overlimit) 2024-03-20T13:29:13.527593998Z stderr F [2024/03/20 13:29:13] [ info] [input] tail.0 resume (mem buf overlimit)