fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.65k stars 1.54k forks source link

Fluent bit gets stuck after "Bad file descriptor" error #3540

Closed MPeli closed 3 years ago

MPeli commented 3 years ago

Bug Report

Describe the bug I often see an error saying "Bad file descriptor". Once this error shows up, fluent bit stops logging completely. Memory consumption increases significantly (up to 1GB).

To Reproduce Not really sure how. I will add more details once I found more.

[error] [C:\projects\fluent-bit-2e87g\src\flb_http_client.c:1163 errno=9] Bad file descriptor
[2021/05/24 18:28:24] [ warn] [output:es:es.0] http_do=-1 URI=/_bulk

Expected behavior Fluent bit should not get stuck and try to reconnect.

Your Environment

Fluent Bit Enterprise - SOS Report
==================================
The following report aims to be used by Fluent Bit and Fluentd community users.

[Fluent Bit]
    Version             1.7.6
    Built Flags          FLB_HAVE_PARSER FLB_HAVE_RECORD_ACCESSOR FLB_HAVE_STREAM_PROCESSOR JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_AWS FLB_HAVE_SIGNV4 FLB_HAVE_SQLDB FLB_HAVE_TRACE FLB_HAVE_TIMESPEC_GET FLB_HAVE_PROXY_GO FLB_HAVE_REGEX FLB_HAVE_UTF8_ENCODER FLB_HAVE_LUAJIT

[Operating System]
    Name                Windows
    Version             6.2
    Build               9200

[Hardware]
    Architecture        x64 (AMD or Intel)
    Processors          12

[Built Plugins]
    Inputs              emitter tail dummy statsd storage_backlog stream_processor winlog tcp lib forward random
    Filters             alter_size aws record_modifier throttle throttle_size kubernetes modify nest parser expect grep rewrite_tag lua stdout geoip2
    Outputs             azure azure_blob counter datadog es file forward http influxdb logdna loki nrlogs null slack splunk stackdriver stdout syslog tcp flowcounter gelf websocket cloudwatch_logs kinesis_streams s3

[SERVER] Runtime configuration
    Flush               5.000000
    Daemon              On
    Log_Level           Trace

[INPUT] Instance
    Name                tail.0 (tail, id=0)
    Flags
    Threaded            No
    Tag                 <appname>-<pid>
    Mem_Buf_Limit       95.4M
    Path                C:\path\*_*.Flog3.log,D:\path\*_*.Flog3.x64.log
    Multiline           off
    Parser_Firstline    flog_parser
    Path_Key            path
    Offset_Key          1
    DB                  fluent.db
    Read_from_Head      On
    Tag_Regex           ^.*\\(?<appname>.+)_(?<pid>[0-9]+).Flog.*.log$
    Buffer_Chunk_Size   128k
    Buffer_Max_Size     256k
    Ignore_Older        10d
    Routes              es.0

[INPUT] Instance
    Name                storage_backlog.1 (storage_backlog, id=1)
    Flags
    Threaded            No
    Tag                 storage_backlog.1
    Routes              es.0

[OUTPUT] Instance
    Name                es.0 (es, id=0)
    Match               *
    TLS Active          Yes
    TLS.Verify          Off
    TLS.Ca_File         (not set)
    TLS.Crt_File        (not set)
    TLS.Key_File        (not set)
    TLS.Key_Passwd      (not set)
    Retry Limit         no limit
    Host.TCP_Port       443
    Host.Name           abcd.eu-west-1.es.amazonaws.com
    Index               fluent-bit
    Logstash_Format     true
    HTTP_User           aaa
    HTTP_Passwd         bbb
    Trace_Output        On
    Trace_Error         On
    Buffer_Size         False

Log files from three different machines bad-file-descriptor-fluent-bit-I.log bad-file-descriptor-fluent-bit-II.log bad-file-descriptor-fluent-bit-III.log

fujimotos commented 3 years ago

I believe this is the same issue with #1022. There is a recovering issue in the core engine (not only on WIndows) after some connection error.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

pierluigilenoci commented 3 years ago

I've no hope of seeing it fixed.

edsiper commented 3 years ago

Bad file descriptor issue has been fixed in recent versions, please upgrade to the latest v1.7

On Mon, Jul 5, 2021, 01:23 Pierluigi Lenoci @.***> wrote:

I've no hope of seeing it fixed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fluent/fluent-bit/issues/3540#issuecomment-873871360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC2INRQUIKN5FSNEZGCCJLTWFMYVANCNFSM45NUQUIA .

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.