fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.65k stars 1.54k forks source link

Crash when using workers #3537

Closed MPeli closed 3 years ago

MPeli commented 3 years ago

Bug Report

Describe the bug I have one log file in a folder. When fluent bit is started, it crashes within a minute.

To Reproduce

Expected behavior Fluent bit should not crash. crash

It seems to me that handles should be valid

Your Environment

[Fluent Bit]
    Version             1.7.6
    Built Flags          FLB_HAVE_PARSER FLB_HAVE_RECORD_ACCESSOR FLB_HAVE_STREAM_PROCESSOR JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_AWS FLB_HAVE_SIGNV4 FLB_HAVE_SQLDB FLB_HAVE_TRACE FLB_HAVE_TIMESPEC_GET FLB_HAVE_PROXY_GO FLB_HAVE_REGEX FLB_HAVE_UTF8_ENCODER FLB_HAVE_LUAJIT

[Operating System]
    Name                Windows
    Version             6.2
    Build               9200

[Hardware]
    Architecture        x64 (AMD or Intel)
    Processors          12

[Built Plugins]
    Inputs              emitter tail dummy statsd storage_backlog stream_processor winlog tcp lib forward random
    Filters             alter_size aws record_modifier throttle throttle_size kubernetes modify nest parser expect grep rewrite_tag lua stdout geoip2
    Outputs             azure azure_blob counter datadog es file forward http influxdb logdna loki nrlogs null slack splunk stackdriver stdout syslog tcp flowcounter gelf websocket cloudwatch_logs kinesis_streams s3

[SERVER] Runtime configuration
    Flush               5.000000
    Daemon              On
    Log_Level           Trace

[INPUT] Instance
    Name                tail.0 (tail, id=0)
    Flags
    Threaded            No
    Tag                 <appname>-<pid>
    Mem_Buf_Limit       95.4M
    Path                D:\path\*_*.Xlog.log,D:\path\*_*.Xlog.x64.log
    Multiline           off
    Parser_Firstline    xlog_parser
    Path_Key            path
    Offset_Key          1
    DB                  fluent.db
    Read_from_Head      On
    Tag_Regex           ^.*\\(?<appname>.+)_(?<pid>[0-9]+).Xlog.*.log$
    Buffer_Chunk_Size   128k
    Buffer_Max_Size     256k
    Ignore_Older        10d
    Routes              es.0

[INPUT] Instance
    Name                storage_backlog.1 (storage_backlog, id=1)
    Flags
    Threaded            No
    Tag                 storage_backlog.1
    Routes              es.0

[OUTPUT] Instance
    Name                es.0 (es, id=0)
    Match               *
    TLS Active          Yes
    TLS.Verify          Off
    TLS.Ca_File         (not set)
    TLS.Crt_File        (not set)
    TLS.Key_File        (not set)
    TLS.Key_Passwd      (not set)
    Retry Limit         no limit
    Host.TCP_Port       443
    Host.Name           abcd.eu-west-1.es.amazonaws.com
    Index               fluent-bit
    Logstash_Format     true
    HTTP_User           aaa
    HTTP_Passwd         bbb
    Trace_Output        On
    Trace_Error         On
    Buffer_Size         False

I set workers to 4 in the [OUTPUT] section. But sosreport did not print it. SimpleLoop_101392.Xlog.log fluent-bit-report.log

MPeli commented 3 years ago

I found out that the crash can be prevented when _set_invalid_parameter_handler and _set_thread_local_invalid_parameter_handler are used to set the invalid parameter handler. They must be called before calling write.

// The parameters all have the value NULL unless a debug version of the CRT library is used.
void flb_invalid_parameter_handler(const wchar_t* expression,
    const wchar_t* function,
    const wchar_t* file,
    unsigned int line,
    uintptr_t pReserved)
{
    flb_debug("Invalid parameter detected in function %s. File: %s Line: %d", function, file, line);
    flb_debug("Expression: %s", expression);

    // abort()
}

_set_invalid_parameter_handler(flb_invalid_parameter_handler);
_set_thread_local_invalid_parameter_handler(flb_invalid_parameter_handler);

See https://docs.microsoft.com/en-us/cpp/c-runtime-library/parameter-validation?view=msvc-160

write then returns -1 instead of crashing

n = write(th_ins->ch_parent_events[1], &task, sizeof(struct flb_task *));

There is still a problem with Bad file descriptor

[2021/05/25 14:10:51] [ info] [engine] started (pid=13664)
[2021/05/25 14:10:51] [debug] [engine] coroutine stack size: 98302 bytes (96.0K)
[2021/05/25 14:10:51] [ info] [storage] created root path C:\path\fluent-bit-backlog
[2021/05/25 14:10:51] [debug] [storage] [cio scan] opening path C:\path\fluent-bit-backlog
[2021/05/25 14:10:51] [debug] [storage] created stream path C:\path\fluent-bit-backlog/tail.0
[2021/05/25 14:10:51] [debug] [storage] [cio stream] new stream registered: tail.0
[2021/05/25 14:10:51] [debug] [storage] [cio stream] new stream registered: storage_backlog.1
[2021/05/25 14:10:51] [ info] [storage] version=1.1.1, initializing...
[2021/05/25 14:10:51] [ info] [storage] root path 'C:\path\fluent-bit-backlog'
[2021/05/25 14:10:51] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=32
[2021/05/25 14:10:51] [ info] [storage] backlog input plugin: storage_backlog.1
[2021/05/25 14:10:51] [debug] [input:tail:tail.0] inode=542965230075788146 with offset=0 appended as C:\path\SimpleLoop_101392.Xlog.log
[2021/05/25 14:10:51] [debug] [input:tail:tail.0] 1 new files found on path 'C:\path\*_*.Xlog.log'
[2021/05/25 14:10:51] [debug] [input:tail:tail.0] 0 new files found on path 'D:\path\*_*.Xlog.x64.log'
[2021/05/25 14:10:51] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 286.1M
[2021/05/25 14:10:51] [debug] [es:es.0] created event channels: read=1004 write=740
[2021/05/25 14:10:51] [debug] [tls] cannot parse a certificate. skipping...
[2021/05/25 14:10:51] [debug] [output:es:es.0] host=abcd.eu-west-1.es.amazonaws.com port=443 uri=/_bulk index=fluent-bit type=_doc
[2021/05/25 14:10:51] [trace] [router] input=tail.0 tag=<appname>-<pid>
[2021/05/25 14:10:51] [ info] [output:es:es.0] worker #0 started
[2021/05/25 14:10:51] [debug] [router] match rule tail.0:es.0
[2021/05/25 14:10:51] [trace] [router] input=storage_backlog.1 tag=storage_backlog.1
[2021/05/25 14:10:51] [debug] [router] match rule storage_backlog.1:es.0
[2021/05/25 14:10:51] [ info] [sp] stream processor started
[2021/05/25 14:10:51] [debug] [storage] [cio file] synced at: tail.0/13664-1621944651.884582600.flb
[2021/05/25 14:10:51] [debug] [storage] [cio file] synced at: tail.0/13664-1621944651.916756400.flb
[2021/05/25 14:10:51] [debug] [storage] [cio file] synced at: tail.0/13664-1621944651.946187200.flb
[2021/05/25 14:10:52] [debug] [storage] [cio file] synced at: tail.0/13664-1621944651.976468900.flb
[2021/05/25 14:10:52] [debug] [storage] [cio file] synced at: tail.0/13664-1621944652.5941000.flb
[2021/05/25 14:10:52] [debug] [storage] [cio file] synced at: tail.0/13664-1621944652.45349500.flb
[2021/05/25 14:10:52] [debug] [storage] [cio file] synced at: tail.0/13664-1621944652.75021600.flb
[2021/05/25 14:10:52] [debug] [input:tail:tail.0] inode=542965230075788146 file=C:\path\SimpleLoop_101392.Xlog.log promote to TAIL_EVENT
[2021/05/25 14:10:56] [trace] [task 000002CD6AD425C0] created (id=0)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD425C0 id=0 OK
[2021/05/25 14:10:56] [trace] [task 000002CD6AD40D00] created (id=1)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD40D00 id=1 OK
[2021/05/25 14:10:56] [trace] [task 000002CD6AD42260] created (id=2)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD42260 id=2 OK
[2021/05/25 14:10:56] [trace] [task 000002CD6AD42020] created (id=3)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD42020 id=3 OK
[2021/05/25 14:10:56] [trace] [task 000002CD6AD40E20] created (id=4)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD40E20 id=4 OK
[2021/05/25 14:10:56] [trace] [task 000002CD6AD41210] created (id=5)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD41210 id=5 OK
[2021/05/25 14:10:56] [trace] [task 000002CD6AD412A0] created (id=6)
[2021/05/25 14:10:56] [debug] [task] created task=000002CD6AD412A0 id=6 OK
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=0 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=1 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=2 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=3 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=4 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=5 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:10:56] [debug] [output:es:es.0] task_id=6 assigned to thread #0
[2021/05/25 14:10:56] [debug] Invalid parameter detected in function (null). File: (null) Line: 0
[2021/05/25 14:10:56] [debug] Expression: (null)
[2021/05/25 14:10:56] [error] [C:\git\fluent\fluent-bit\src\flb_output_thread.c:397 errno=9] Bad file descriptor
[2021/05/25 14:11:51] [debug] [input:tail:tail.0] 0 new files found on path 'C:\path\*_*.Xlog.log'
[2021/05/25 14:11:51] [debug] [input:tail:tail.0] 0 new files found on path 'D:\path\*_*.Xlog.x64.log'

fluent-bit.log

edsiper commented 3 years ago

cc: @leonardo-albertovich

agup006 commented 3 years ago

A fix has been pushed to 1.7 branch @MPeli would you be able to try it out from there?

agup006 commented 3 years ago

https://github.com/leonardo-albertovich/fluent-bit/tree/windows-worker-issue-3537

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.