Describe the bug
We have few applications which produce heavy logging. By heavy logging I mean both size and frequency. Average size of a json log line will be around 100KB. We have noticed that fluenbit which normally uses <5% shoots to 100% CPU usage which remains constant. Evey time, this log line was observed in docker logs
[2020/08/14 07:30:03] [ warn] [input:tail:tail.3] file=/opt/logs/flights-nav-ancillaries/flights-nav-ancillaries.json have long lines. Skipping long lines.
Memory usage during that time is <50MiB even though 512 MB is given per container
CPU allotted 256vcpu
Expected behavior
If the buffer is full, contents should be flushed and it should start pushing data again
Your Environment
Version used: We tested this 1.5.0 till 1.5.3. Got this with all versions
Configuration:
[SERVICE]
Flush 4
Daemon off
Parsers_File parsers_mmt.conf
[INPUT]
Name tail
Path /opt/logs/access.log,/opt/logs/*/access.log,/opt/logs/*/*/access.log
DB /opt/logs/Offset.db
Refresh_Interval 5
Path_Key file
Mem_Buf_Limit 100MB
Buffer_Chunk_Size 10MB
Buffer_Max_Size 50MB
Skip_Long_Lines On
Parser mmt
Tag access
[INPUT]
Name tail
Path /opt/logs/nginx-frontend.log,/opt/logs/*/nginx-frontend.log,/opt/logs/*/*/nginx-frontend.log
DB /opt/logs/Offset.db
Refresh_Interval 5
Mem_Buf_Limit 100MB
Buffer_Chunk_Size 10MB
Path_Key file
Buffer_Max_Size 50MB
Skip_Long_Lines On
Parser mmt
Tag access
[INPUT]
Name tail
Path /opt/logs/*.log,/opt/logs/*/*.log,/opt/logs/*/*/*.log
Exclude_Path /opt/logs/access.log,/opt/logs/nginx-frontend.log,/opt/logs/*/nginx-frontend.log,/opt/logs/*/access.log,/opt/logs/*/*/access.log,/opt/logs/*/*/nginx-frontend.log
DB /opt/logs/Offset-app.db
Refresh_Interval 5
Mem_Buf_Limit 200MB
Buffer_Chunk_Size 10MB
Buffer_Max_Size 50MB
Skip_Long_Lines On
Path_Key file
Multiline On
Parser_Firstline mmt
Tag app
[INPUT]
Name tail
Path /opt/logs/*.json,/opt/logs/*/*.json
DB /opt/logs/Offset-app.db
Key message
Refresh_Interval 5
Mem_Buf_Limit 200MB
Buffer_Chunk_Size 20MB
Buffer_Max_Size 100MB
Skip_Long_Lines On
Path_Key file
Tag json_app
[FILTER]
Name modify
Match *
Add app_name ${SERVICE}
Add DEPLOYMENT_VERSION ${DEPLOYMENT_VERSION}
Add hostname ${BASE_SERVER_IP}
[OUTPUT]
Name kafka
Match access*
Brokers ${KAFKA_HOST}
Topics m_${SPACE}_${SERVICE}_mon_access
Timestamp_Key log_timestamp
Timestamp_Format iso8601
rdkafka.compression.codec snappy
[OUTPUT]
Name kafka
Match app*
Brokers ${KAFKA_HOST}
Topics m_${SPACE}_${SERVICE}_mon_app
Timestamp_Key log_timestamp
Timestamp_Format iso8601
rdkafka.compression.codec snappy
[OUTPUT]
Name kafka
Match json_app
Brokers ${KAFKA_HOST}
Topics json_m_${SPACE}_${SERVICE}_mon_app
Timestamp_Key log_timestamp
Timestamp_Format iso8601
rdkafka.compression.codec snappy
Environment name and version (e.g. Kubernetes? What version?): AWS ECS
Server type and version: Using Fluenbit docker image as sidecar
Additional context
The problem is not with CPU utilization but in either case it shouldn't stop forwarding logs
Had taken an Intenal Dump as mentioned in the doc. Looks like it left all the tasks
[engine] caught signal (SIGCONT)
[2020/08/14 08:46:52] Fluent Bit Dump
===== Input =====
tail.0 (tail)
│
├─ status
│ └─ overlimit : no
│ ├─ mem size : 0b (0 bytes)
│ └─ mem limit : 95.4M (100000000 bytes)
│
├─ tasks
│ ├─ total tasks : 0
│ ├─ new : 0
│ ├─ running : 0
│ └─ size : 0b (0 bytes)
│
└─ chunks
└─ total chunks : 0
├─ up chunks : 0
├─ down chunks: 0
└─ busy chunks: 0
├─ size : 0b (0 bytes)
└─ size err: 0
tail.1 (tail)
│
├─ status
│ └─ overlimit : no
│ ├─ mem size : 0b (0 bytes)
│ └─ mem limit : 95.4M (100000000 bytes)
│
├─ tasks
│ ├─ total tasks : 0
│ ├─ new : 0
│ ├─ running : 0
│ └─ size : 0b (0 bytes)
│
└─ chunks
└─ total chunks : 0
├─ up chunks : 0
├─ down chunks: 0
└─ busy chunks: 0
├─ size : 0b (0 bytes)
└─ size err: 0
tail.2 (tail)
│
├─ status
│ └─ overlimit : no
│ ├─ mem size : 765.5K (783873 bytes)
│ └─ mem limit : 190.7M (200000000 bytes)
│
├─ tasks
│ ├─ total tasks : 1
│ ├─ new : 0
│ ├─ running : 1
│ └─ size : 765.5K (783873 bytes)
│
└─ chunks
└─ total chunks : 1
├─ up chunks : 1
├─ down chunks: 0
└─ busy chunks: 1
├─ size : 765.5K (783873 bytes)
└─ size err: 0
tail.3 (tail)
│
├─ status
│ └─ overlimit : no
│ ├─ mem size : 284.5K (291354 bytes)
│ └─ mem limit : 190.7M (200000000 bytes)
│
├─ tasks
│ ├─ total tasks : 0
│ ├─ new : 0
│ ├─ running : 0
│ └─ size : 0b (0 bytes)
│
└─ chunks
└─ total chunks : 1
├─ up chunks : 1
├─ down chunks: 0
└─ busy chunks: 0
├─ size : 0b (0 bytes)
└─ size err: 0
===== Storage Layer =====
total chunks : 2
├─ mem chunks : 2
└─ fs chunks : 0
├─ up : 0
└─ down : 0
Bug Report
Describe the bug We have few applications which produce heavy logging. By heavy logging I mean both size and frequency. Average size of a json log line will be around 100KB. We have noticed that fluenbit which normally uses <5% shoots to 100% CPU usage which remains constant. Evey time, this log line was observed in docker logs
[2020/08/14 07:30:03] [ warn] [input:tail:tail.3] file=/opt/logs/flights-nav-ancillaries/flights-nav-ancillaries.json have long lines. Skipping long lines.
Memory usage during that time is <50MiB even though 512 MB is given per container CPU allotted 256vcpuExpected behavior If the buffer is full, contents should be flushed and it should start pushing data again
Your Environment
Additional context The problem is not with CPU utilization but in either case it shouldn't stop forwarding logs
Had taken an Intenal Dump as mentioned in the doc. Looks like it left all the tasks