\Device\Afd file handles leak on Windows associated with in-process diagnostic logging

danielodievich commented 1 year ago

Bug Report

Describe the bug Fluent-Bit 2.0.8.0 is leaking \Device\Afd file handles on Windows Server 2016, Windows Server 2019 and Windows 10 when diagnostic logging is sent between threads using local sockets.

The rate of leakage is proportional to the level of diagnostic logging and logging of errors.

We recently decided to test 2.0.8 version of this agent and discovered that it behaves different from 1.9.6, with continuously increasing memory consumption. Left unchecked, it leads to either fluent-bit crash and/or general system instability.

Our Fluent-Bit Usage Observe uses Fluent-Bit to do log shipping as part of our [Host Monitoring App]. The agent.ps1 install script is currently pinned to pull fluent-bit 1.9.6. It also installs Telegraf and OSQuery although the issue. During default install, this fluent-bit.conf is used, with <<observe_host_name>> and <<observe_host_name>> variables replaced to the values provided to the script. We use http output plugin to send things to our ingest endpoints.

To Reproduce Install Fluent-Bit 2.0.8.0.

Replace C:\Program Files\fluent-bit\conf\fluent-bit.conf with this configuration:

[SERVICE]
    flush        5
    daemon       Off
    log_level    debug  
    log_file    C:\Program Files\fluent-bit\fluent.log
[INPUT]
    Name         winlog
    Channels     Security,Application,System
    Interval_Sec 5
    DB           C:\Program Files\fluent-bit\fluent.pos
# Uncomment the below section if using AWS EC2
#####[FILTER]
#####    Name aws
#####    Match *
#####    imds_version v1
#####    az true
#####    ec2_instance_id true
#####    ec2_instance_type true
#####    account_id true
#####    hostname true
#####    vpc_id true
[FILTER]
    Name record_modifier
    Match *
# if you want to group your servers into an application group
# [e.g. Proxy nodes] so you have have custom alert levels for them
# uncomment this next line
#    Record appgroup ha-proxy
    Record host ${HOSTNAME}
    Record datacenter AWS
    Remove_key _MACHINE_ID

[INPUT]
    name tail
    tag  tail_osquery_results
    Path_Key path
    path C:\Program Files\osquery\log\osqueryd.results.log
    Read_from_Head False
    db      osquery-results.db
[INPUT]
    name tail
    tag  tail_osquery_snapshots
    Path_Key path
    path C:\Program Files\osquery\log\osqueryd.snapshots.log
    Read_from_Head False
    db      osquery-snapshots.db
[OUTPUT]
    name        http
    match       *
    host        badendpoint.collect.observe-eng.com
    port        443
    URI         /v1/http/fluentbit/tail
    Format      msgpack
    Header      X-Observe-Decoder fluent
    Header      Authorization Bearer badtokenbadtokenbadtokenbadtoken
    Compress    gzip
    tls         on

Both the ingest endpoint and authentication token are intentionally bad. This increases number of logged failures and increases the rate of \Device\Afd file handles leakage.

Start/restart fluent-bit service.

Start monitoring \Process(fluent-bit)\Handle Count counter and observe it steadily climbing upwards as time goes by.

Expected behavior Fluent-Bit should not leak memory or destabilize the operating system.

Your Environment

Version used: 2.0.8.0
Server type and version: Windows Server 2016, Windows Server 2019, Windows 10
Filters and plugins: input: winlog, tail, output: http

Diagnostic Capture Details I captured starting fluent-bit from scratch and running it for 180 seconds, including these data sets:

Perfview trace of memory, handle and network
Wireshark capture traffic to local loopback adapter
PerfMon counters from System Performance data collector
Sysinternals Handle output for fluent-bit

Perfview command used:

.\PerfView.exe  "/DataFile:C:\fluent-bit-traces\fluent-bit-trace.etl" /BufferSizeMB:256 /StackCompression /CircularMB:50000 /CpuSampleMSec:1000 /MaxCollectSec:180 /KernelEvents:Memory,VirtualAlloc,VAMap,Handle,Default,FileIOInit /ClrEvents:None /Providers:"Microsoft-Windows-Kernel-Memory" /NoGui /NetworkCapture /NetMonCapture collect

Script that i used with Sysinternals Handle to snapshot detailed list of what was outstanding in the process every 10 seconds:

Remove-Item handles.csv

for ($i = 0; $i -lt 20; $i++)
{
    # C:\Tools\SysinternalsSuite\handle.exe -p fluent-bit -s >> handles.txt
    $handlesOutput = C:\Tools\SysinternalsSuite\handle.exe -p fluent-bit -a -v -nobanner
    $handlesCsv = ConvertFrom-Csv $handlesOutput
    $handlesCsv.Length
    $currentTime = Get-Date
    $handlesCSVWithDates = $handlesCsv | Select-Object *,@{Name='Timestamp';Expression={$currentTime}}
    $handlesCSVWithDates | Export-Csv -NoTypeInformation -Append -Path handles.csv

    Sleep -Seconds 10
}

Analysis of Run 1 During Run 1, I had correct URL and authentication token in the fluent-bit.conf OUTPUT section for sending data to Observe.

Results in GDrive/fluent-bit-handle-leak-Run1.zip

Perfmon counters The \Process(fluent-bit)\Handle Count counter in \Run1\OBSERVEWS2016_20230127-000007\Performance Counter.blg climbs from 273 to 317

Handle list The \Run1\fluent-bit-handles-run-1.xlsx based on \Run1\handles.csv shows growth of those handles by type:

Fluent Log \Run1\fluent.log shows what was written to its log.

Wireshark Log \Run1\fluent-bit-trace.pcapng.gz captures local loopback and external adapter traffic.

Trace Analysis I loaded \Run1\fluent-bit-trace.etl.zip into Windows Performance Analyzer.

In Memory\Handles\Outstanding Count by Process report:

Added Closing Process column to the left of gold line
Added Object Name column to the left of gold-line
Added Create Time and Close Time columns
Filtered series to fluent-bit.exe
Filtered Handle Type to File
Filtered Closing Process to Unknown to indicate handles that are still not closed by the time the trace ends
Filtered Object Name to \Device\Afd to focus on the intra-process socket communication

In Communications\TcpIp Events\Events by Count report:

Added Process Name column to the left of gold line as first
Filtered series to fluent-bit.exe

You can see the steady growth of outstanding \Device\Afd handles starting at trace time 6.48:

I focused on this handle: Line #	Creating Process	Handle Type	Closing Process	Object Name	Object	Handle	Create Time (s)	Close Time (s)	Count
96					0xFFFFA0084E048D60	0x00000460	11.555046800	180.901461600	1

Zooming in into trace time 11.55-12.00 I see some sort of TCP conversation. I chose the 211 in the.

Correlating this to the time in Wireshark, I can see that this socket did this conversation:

No. Time    Source  Destination Protocol    Length  Info
522 2023-01-27 09:00:25.376148  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=147457 Ack=1 Win=525568 Len=1460
523 2023-01-27 09:00:25.376153  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=148917 Ack=1 Win=525568 Len=1460
524 2023-01-27 09:00:25.376156  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=150377 Ack=1 Win=525568 Len=1176
525 2023-01-27 09:00:25.376215  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=151553 Win=525568 Len=0
526 2023-01-27 09:00:25.376282  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=151553 Ack=1 Win=525568 Len=1460
527 2023-01-27 09:00:25.376286  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=153013 Ack=1 Win=525568 Len=1460
528 2023-01-27 09:00:25.376290  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=154473 Ack=1 Win=525568 Len=1176
529 2023-01-27 09:00:25.376311  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=155649 Win=525568 Len=0
530 2023-01-27 09:00:25.376364  127.0.0.1   127.0.0.1   TCP 52  50961 → 50962 [PSH, ACK] Seq=1 Ack=1 Win=525568 Len=8
531 2023-01-27 09:00:25.376385  127.0.0.1   127.0.0.1   TCP 44  50962 → 50961 [ACK] Seq=1 Ack=9 Win=525568 Len=0
532 2023-01-27 09:00:25.381984  127.0.0.1   127.0.0.1   TCP 56  51016 → 51015 [SYN, ECE, CWR] Seq=0 Win=8192 Len=0 MSS=65495 WS=256 SACK_PERM
533 2023-01-27 09:00:25.382016  127.0.0.1   127.0.0.1   TCP 56  51015 → 51016 [SYN, ACK, ECE] Seq=0 Ack=1 Win=8192 Len=0 MSS=65495 WS=256 SACK_PERM
534 2023-01-27 09:00:25.382087  127.0.0.1   127.0.0.1   TCP 44  51016 → 51015 [ACK] Seq=1 Ack=1 Win=525568 Len=0
535 2023-01-27 09:00:25.475627  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=155649 Ack=1 Win=525568 Len=1460
536 2023-01-27 09:00:25.475634  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=157109 Ack=1 Win=525568 Len=1460
537 2023-01-27 09:00:25.475638  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=158569 Ack=1 Win=525568 Len=1176
538 2023-01-27 09:00:25.475686  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=159745 Win=525568 Len=0
539 2023-01-27 09:00:25.488014  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=159745 Ack=1 Win=525568 Len=1460
540 2023-01-27 09:00:25.488029  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=161205 Ack=1 Win=525568 Len=1460
541 2023-01-27 09:00:25.488033  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=162665 Ack=1 Win=525568 Len=1176
542 2023-01-27 09:00:25.488072  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=163841 Win=525568 Len=0
543 2023-01-27 09:00:25.498279  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=163841 Ack=1 Win=525568 Len=1460
544 2023-01-27 09:00:25.498283  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=165301 Ack=1 Win=525568 Len=1460
545 2023-01-27 09:00:25.498285  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=166761 Ack=1 Win=525568 Len=1176
546 2023-01-27 09:00:25.498324  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=167937 Win=525568 Len=0
547 2023-01-27 09:00:25.506141  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=167937 Ack=1 Win=525568 Len=1460
548 2023-01-27 09:00:25.506145  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=169397 Ack=1 Win=525568 Len=1460
549 2023-01-27 09:00:25.506147  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=170857 Ack=1 Win=525568 Len=1176
550 2023-01-27 09:00:25.506207  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=172033 Win=525568 Len=0
551 2023-01-27 09:00:25.507183  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=172033 Ack=1 Win=525568 Len=1460
552 2023-01-27 09:00:25.507188  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=173493 Ack=1 Win=525568 Len=1460
553 2023-01-27 09:00:25.507191  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=174953 Ack=1 Win=525568 Len=1176
554 2023-01-27 09:00:25.507227  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=176129 Win=525568 Len=0
555 2023-01-27 09:00:25.511611  127.0.0.1   127.0.0.1   TCP 1504    50967 → 50968 [ACK] Seq=4097 Ack=1 Win=525568 Len=1460
556 2023-01-27 09:00:25.511616  127.0.0.1   127.0.0.1   TCP 1504    50967 → 50968 [ACK] Seq=5557 Ack=1 Win=525568 Len=1460
557 2023-01-27 09:00:25.511630  127.0.0.1   127.0.0.1   TCP 1220    50967 → 50968 [PSH, ACK] Seq=7017 Ack=1 Win=525568 Len=1176
558 2023-01-27 09:00:25.511693  127.0.0.1   127.0.0.1   TCP 44  50968 → 50967 [ACK] Seq=1 Ack=8193 Win=525568 Len=0
559 2023-01-27 09:00:25.518863  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=176129 Ack=1 Win=525568 Len=1460
560 2023-01-27 09:00:25.518874  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=177589 Ack=1 Win=525568 Len=1460
561 2023-01-27 09:00:25.518881  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=179049 Ack=1 Win=525568 Len=1176
562 2023-01-27 09:00:25.519088  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=180225 Win=525568 Len=0
563 2023-01-27 09:00:25.536222  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=180225 Ack=1 Win=525568 Len=1460
564 2023-01-27 09:00:25.536226  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=181685 Ack=1 Win=525568 Len=1460
565 2023-01-27 09:00:25.536230  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=183145 Ack=1 Win=525568 Len=1176
566 2023-01-27 09:00:25.536268  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=184321 Win=525568 Len=0
567 2023-01-27 09:00:25.547902  127.0.0.1   127.0.0.1   TCP 1504    50967 → 50968 [ACK] Seq=8193 Ack=1 Win=525568 Len=1460
568 2023-01-27 09:00:25.547908  127.0.0.1   127.0.0.1   TCP 1504    50967 → 50968 [ACK] Seq=9653 Ack=1 Win=525568 Len=1460
569 2023-01-27 09:00:25.547911  127.0.0.1   127.0.0.1   TCP 1220    50967 → 50968 [PSH, ACK] Seq=11113 Ack=1 Win=525568 Len=1176
570 2023-01-27 09:00:25.547950  127.0.0.1   127.0.0.1   TCP 44  50968 → 50967 [ACK] Seq=1 Ack=12289 Win=525568 Len=0
571 2023-01-27 09:00:25.548182  127.0.0.1   127.0.0.1   TCP 52  50979 → 50980 [PSH, ACK] Seq=1 Ack=1 Win=525568 Len=8
572 2023-01-27 09:00:25.548227  127.0.0.1   127.0.0.1   TCP 44  50980 → 50979 [ACK] Seq=1 Ack=9 Win=525568 Len=0
573 2023-01-27 09:00:25.548316  127.0.0.1   127.0.0.1   TCP 1504    50967 → 50968 [ACK] Seq=12289 Ack=1 Win=525568 Len=1460
574 2023-01-27 09:00:25.548319  127.0.0.1   127.0.0.1   TCP 1504    50967 → 50968 [ACK] Seq=13749 Ack=1 Win=525568 Len=1460
575 2023-01-27 09:00:25.548323  127.0.0.1   127.0.0.1   TCP 1220    50967 → 50968 [PSH, ACK] Seq=15209 Ack=1 Win=525568 Len=1176
576 2023-01-27 09:00:25.548345  127.0.0.1   127.0.0.1   TCP 44  50968 → 50967 [ACK] Seq=1 Ack=16385 Win=525568 Len=0
577 2023-01-27 09:00:25.548423  127.0.0.1   127.0.0.1   TCP 52  50957 → 50958 [PSH, ACK] Seq=1 Ack=1 Win=525568 Len=8
578 2023-01-27 09:00:25.548444  127.0.0.1   127.0.0.1   TCP 44  50958 → 50957 [ACK] Seq=1 Ack=9 Win=525568 Len=0
579 2023-01-27 09:00:25.555461  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=184321 Ack=1 Win=525568 Len=1460
580 2023-01-27 09:00:25.555463  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=185781 Ack=1 Win=525568 Len=1460
581 2023-01-27 09:00:25.555465  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=187241 Ack=1 Win=525568 Len=1176
582 2023-01-27 09:00:25.555505  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=188417 Win=525568 Len=0
583 2023-01-27 09:00:25.567343  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=188417 Ack=1 Win=525568 Len=1460
584 2023-01-27 09:00:25.567347  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=189877 Ack=1 Win=525568 Len=1460
585 2023-01-27 09:00:25.567350  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=191337 Ack=1 Win=525568 Len=1176
586 2023-01-27 09:00:25.567384  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=192513 Win=525568 Len=0
587 2023-01-27 09:00:25.568720  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=192513 Ack=1 Win=525568 Len=1460
588 2023-01-27 09:00:25.568731  127.0.0.1   127.0.0.1   TCP 1504    50935 → 50936 [ACK] Seq=193973 Ack=1 Win=525568 Len=1460
589 2023-01-27 09:00:25.568738  127.0.0.1   127.0.0.1   TCP 1220    50935 → 50936 [PSH, ACK] Seq=195433 Ack=1 Win=525568 Len=1176
590 2023-01-27 09:00:25.568877  127.0.0.1   127.0.0.1   TCP 44  50936 → 50935 [ACK] Seq=1 Ack=196609 Win=525568 Len=0

The payloads in packet 522 shows that this is is clearly a diagnostic log:

0000   02 00 00 00 45 02 05 dc 64 b7 40 00 80 06 00 00   ....E...d.@.....
0010   7f 00 00 01 7f 00 00 01 c6 f7 c6 f8 95 18 ce de   ................
0020   f3 91 64 13 50 10 08 05 d8 70 00 00 4b 00 00 00   ..d.P....p..K...
0030   00 00 00 00 5b 32 30 32 33 2f 30 31 2f 32 37 20   ....[2023/01/27 
0040   30 39 3a 30 30 3a 32 35 5d 20 5b 64 65 62 75 67   09:00:25] [debug
0050   5d 20 5b 74 61 73 6b 5d 20 63 72 65 61 74 65 64   ] [task] created
0060   20 74 61 73 6b 3d 30 30 30 30 30 31 46 34 36 34    task=000001F464
0070   31 30 35 44 34 30 20 69 64 3d 30 20 4f 4b 0a 00   105D40 id=0 OK..

These correspond to the lines 39-54 in the fluent.log although some other plugins interleave:

[2023/01/27 09:00:25] [debug] [task] created task=000001F464105D40 id=0 OK
[2023/01/27 09:00:25] [debug] [output:http:http.0] task_id=0 assigned to thread #0
[2023/01/27 09:00:25] [debug] [input:winlog:winlog.0] read 13328 bytes from 'Security'
[2023/01/27 09:00:25] [debug] [input:winlog:winlog.0] save channel<Security record=24548 time=1674838812>
[2023/01/27 09:00:25] [debug] [input chunk] update output instances with new chunk size diff=34525
[2023/01/27 09:00:25] [debug] [input:winlog:winlog.0] read 952 bytes from 'Application'
[2023/01/27 09:00:25] [debug] [input:winlog:winlog.0] save channel<Application record=2695 time=1674838459>
[2023/01/27 09:00:25] [debug] [http_client] not using http_proxy for header
[2023/01/27 09:00:25] [debug] [input chunk] update output instances with new chunk size diff=1831
[2023/01/27 09:00:25] [debug] [input:winlog:winlog.0] read 36200 bytes from 'System'
[2023/01/27 09:00:25] [ info] [output:http:http.0] 111775605936.collect.observe-eng.com:443, HTTP status=200
{"ok":true}
[2023/01/27 09:00:25] [debug] [out flush] cb_destroy coro_id=0
[2023/01/27 09:00:25] [debug] [input:winlog:winlog.0] save channel<System record=7730 time=1674838824>
[2023/01/27 09:00:25] [debug] [input chunk] update output instances with new chunk size diff=60001
[2023/01/27 09:00:25] [debug] [task] destroy task=000001F464105D40 (task_id=0)

Analysis of Run 2 On Run 2, I had intentionally broke the URL and provided bad authentication token in the fluent-bit.conf OUTPUT section for sending data to Observe. This made fluent-bit error out more and do more diagnostic logging of errors and retries.

Results in GDrive/fluent-bit-handle-leak-Run2.zip

Perfmon counters The \Process(fluent-bit)\Handle Count counter in \Run2\OBSERVEWS2016_20230127-000008\Performance Counter.blg climbs from 273 to 428. This is steeper of a climb than Run 2

Handle list The \Run2\fluent-bit-handles-run-1.xlsx based on \Run2\handles.csv shows growth of those handles by type, again at higher rate:

Fluent Log \Run2\fluent.log shows what was written to its log.

Wireshark Log \Run2\fluent-bit-trace.pcapng.gz captures local loopback and external adapter traffic.

Trace Analysis I loaded \Run2\fluent-bit-trace.etl.zip into Windows Performance Analyzer.

Same configuration as in Run 2 shows high rate of \Device\Afd handle creation without disposing. There is also some counts of Tcp Errors, I think those

Speculation on Root Cause I think something in the diagnostic logging that cross-posts messages between various threads and components via the local loopback socket is allocating the handle but is forgetting to dispose them. But it only does it for debug style and/or error messages. Higher observed rate of leakage when more errors are logging support this hypothesis.

MrTomasz commented 1 year ago

I confirm same scenario with v2.0.8 and also when compiled latest HEAD - it leaks handles.

Running on 2 machines - first, after 4hrs I can see 7350+ handles open and growing, another machine after 17hrs - 26k and growing.

MrTomasz commented 1 year ago

There are plenty fluent-bit local TCP sockets opened indeed... this is just a snippet from my netstat:

TCP 127.0.0.1:6010 127.0.0.1:6011 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6011 127.0.0.1:6010 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6013 127.0.0.1:6014 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6014 127.0.0.1:6013 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6028 127.0.0.1:6029 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6029 127.0.0.1:6028 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6030 127.0.0.1:6031 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6030 127.0.0.1:6032 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6031 127.0.0.1:6030 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6032 127.0.0.1:6030 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6045 127.0.0.1:6046 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6046 127.0.0.1:6045 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6052 127.0.0.1:6053 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6053 127.0.0.1:6052 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6056 127.0.0.1:6057 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6057 127.0.0.1:6056 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6067 127.0.0.1:6068 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6068 127.0.0.1:6067 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6081 127.0.0.1:6082 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6082 127.0.0.1:6081 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6087 127.0.0.1:6088 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6088 127.0.0.1:6087 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6091 127.0.0.1:6092 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6092 127.0.0.1:6091 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6100 127.0.0.1:6101 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6101 127.0.0.1:6100 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6110 127.0.0.1:6111 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6111 127.0.0.1:6110 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6117 127.0.0.1:6118 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6118 127.0.0.1:6117 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6135 127.0.0.1:6136 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6136 127.0.0.1:6135 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6139 127.0.0.1:6140 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6140 127.0.0.1:6139 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6169 127.0.0.1:6170 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6170 127.0.0.1:6169 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6177 127.0.0.1:6178 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6178 127.0.0.1:6177 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6188 127.0.0.1:6189 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6189 127.0.0.1:6188 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6191 127.0.0.1:6192 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6192 127.0.0.1:6191 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6193 127.0.0.1:6194 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6194 127.0.0.1:6193 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6197 127.0.0.1:6198 ESTABLISHED [fluent-bit.exe] TCP 127.0.0.1:6198 127.0.0.1:6197 ESTABLISHED

Goes up to port 64322. That is a huge leak.

PS. In my scenario, I use tail input with loki output.

MrTomasz commented 1 year ago

Quick script to get all relevant info:

while ($true) {
$svcName="fluent-bit";
$svcProc=(Get-Process -Name $svcName);
$svcUptime=((get-date).Subtract($svcProc.starttime));
$svcSockets=(Get-NetTCPConnection -OwningProcess $svcProc.Id).Count;
Write-Host "[$($svcName)] PID: $($svcProc.Id), Handles: $($svcProc.Handles), Sockets: $($svcSockets), Uptime: $($svcUptime.Days)d $($svcUptime.hours)h $($svcUptime.minutes)m $($svcUptime.seconds)s";
Sleep -Seconds 5;
}

At the moment, I am debugging - trying to track down what is leaking exactly these localhost sockets.

My current stats:

Server_TypeA_01: [fluent-bit] PID: 15776, Handles: 34687, Sockets: 51627, Uptime: 0d 12h 43m 54s
Server_TypeA_02: [fluent-bit] PID: 20520, Handles: 30973, Sockets: 46053, Uptime: 0d 12h 43m 50s
Server_TypeA_03: [fluent-bit] PID: 27512, Handles: 31496, Sockets: 46839, Uptime: 0d 12h 43m 49s
Server_TypeA_04: [fluent-bit] PID: 28928, Handles: 31981, Sockets: 47565, Uptime: 0d 12h 43m 49s
Server_TypeA_05: [fluent-bit] PID: 2892, Handles: 25014, Sockets: 37125, Uptime: 0d 12h 57m 8s
Server_TypeB_01: [fluent-bit] PID: 3256, Handles: 1691, Sockets: 2220, Uptime: 0d 12h 43m 50s
Server_TypeB_02: [fluent-bit] PID: 6420, Handles: 1673, Sockets: 2193, Uptime: 0d 12h 43m 49s
Server_TypeC_01: [fluent-bit] PID: 1348, Handles: 393, Sockets: 276, Uptime: 0d 0h 32m 30s

Short amount of samples on Server_TypeA_05 shows the velocity of things happening:

[fluent-bit] PID: 2892, Handles: 25363, Sockets: 37635, Uptime: 0d 13h 8m 0s
[fluent-bit] PID: 2892, Handles: 25371, Sockets: 37650, Uptime: 0d 13h 8m 27s
[fluent-bit] PID: 2892, Handles: 25381, Sockets: 37665, Uptime: 0d 13h 8m 54s
[fluent-bit] PID: 2892, Handles: 25397, Sockets: 37686, Uptime: 0d 13h 9m 22s
[fluent-bit] PID: 2892, Handles: 25403, Sockets: 37695, Uptime: 0d 13h 9m 49s
[fluent-bit] PID: 2892, Handles: 25413, Sockets: 37710, Uptime: 0d 13h 10m 17s

Some graphs of last 36hrs:

jlebaugh commented 1 year ago

Thanks for looking into this!

MrTomasz commented 1 year ago

The candidate is monkey backend for flb_log module -- I suspect somewhere resources are not correctly free'd and it just opens another socket pair.

/** Create two new sockets that are connected to each other.

    On Unix, this simply calls socketpair().  On Windows, it uses the
    loopback network interface on 127.0.0.1, and only
    AF_INET,SOCK_STREAM are supported.

    (This may fail on some Windows hosts where firewall software has cleverly
    decided to keep 127.0.0.1 from talking to itself.)

    Parameters and return values are as for socketpair()
*/
EVENT2_EXPORT_SYMBOL
int evutil_socketpair(int d, int type, int protocol, evutil_socket_t sv[2]);
/** Do platform-specific operations as needed to make a socket nonblocking.

    @param sock The socket to make nonblocking
    @return 0 on success, -1 on failure
 */

Used as a "backend" for flb_pipe on Windows:

/*
 * Building on Windows means that Monkey library (lib/monkey) and it
 * core runtime have been build with 'libevent' backend support, that
 * library provide an abstraction to create a socketpairs.
 *
 * Creating a pipe on Fluent Bit @Windows, means create a socket pair.
 */
int flb_pipe_create(flb_pipefd_t pipefd[2])

On my end, I use "info" log level and yes, lots of handle++ && socket++ happens only when log lines are produced.

This is sample log output and counters before and after:

02/02/2023 02:00:46 [fluent-bit] PID: 18112, Handles: 1858, Sockets: 2388, Uptime: 0d 0h 43m 33s
-> Handles += 4, Sockets += 6 (3 pairs?)
02/02/2023 02:00:48 [fluent-bit] PID: 18112, Handles: 1862, Sockets: 2394, Uptime: 0d 0h 43m 35s

[2023/02/02 02:00:45] [ info] flb_dns_ares_socket: socket(af=2, type=2, protocol=0) = fd=8060
[2023/02/02 02:00:45] [ info] flb_dns_ares_connect: connect(fd=8060, addr=1.1.1.1:13568)
[2023/02/02 02:00:45] [ info] flb_dns_ares_close: (fd=8060)
[2023/02/02 02:00:45] [ info] flb_dns_ares_close: socket close(fd=8060)
[2023/02/02 02:00:45] [ info] flb_dns_ares_socket: socket(af=2, type=2, protocol=0) = fd=8060
[2023/02/02 02:00:45] [ info] flb_dns_ares_connect: connect(fd=8060, addr=1.1.1.1:13568)
[2023/02/02 02:00:45] [ info] flb_net_tcp_connect: socket(af=2, type=1, protocol=0) = fd=8064
[2023/02/02 02:00:45] [ info] net_connect_async: connect(async, fd=8064, addr=3.3.3.3:7180)
[2023/02/02 02:00:45] [ info] [net] connection #8064 in process to 2.2.2.2:3100
[2023/02/02 02:00:45] [ info] flb_dns_ares_close: (fd=8060)
[2023/02/02 02:00:45] [ info] flb_dns_ares_close: socket close(fd=8060)
[2023/02/02 02:00:45] [ info] flb_net_tcp_connect: socket(af=2, type=1, protocol=0) = fd=8060
[2023/02/02 02:00:45] [ info] net_connect_async: connect(async, fd=8060, addr=3.3.3.3:7180)
[2023/02/02 02:00:45] [ info] [net] connection #8060 in process to 2.2.2.2:3100
[2023/02/02 02:00:45] [ info] [io] connection OK
[2023/02/02 02:00:45] [ info] [http_client] not using http_proxy for header
[2023/02/02 02:00:45] [ info] [io] connection OK
[2023/02/02 02:00:45] [ info] [http_client] not using http_proxy for header
[2023/02/02 02:00:45] [ info] prepare_destroy_conn: socket close(fd=8060)
[2023/02/02 02:00:45] [ info] prepare_destroy_conn: socket close(fd=8064)

danielodievich commented 1 year ago

Congratulations on discovery so far. The wireshark trace shows two connections - bidirectional - appearing every time something happens. The leakage always appears in the increments of two. This is quite a likely candidate indeed.

MrTomasz commented 1 year ago

Ah, flb_pipe is also used for in_tail... that would explain multiple socket pairs being created. Need to track down if they're properly close everywhere where used.

MrTomasz commented 1 year ago

It seems a mk_event used by flb_upstream_conn / flb_net_tcp_connect is allocating socketpair, but not releasing it, because fds are 0... so corruption or misassignment in-transit after job is done...

_mk_event_del: evutil_closesocketpair?? (mk_event_ctx=0000021F22AFB2F0, ev_map->pipe_fds={0,0}
_mk_event_del: evutil_closesocketpair?? (mk_event_ctx=0000021F22AFB2F0, ev_map->pipe_fds={0,0}
_mk_event_del: evutil_closesocketpair?? (mk_event_ctx=0000021F22AFB2F0, ev_map->pipe_fds={0,0}

MrTomasz commented 1 year ago

I have some breakthrough, but I am not fully convinced of the change, as there is not much documentation of the libevent and monkey mk_event things. Downside of that is that I am seeing write: no error messages on timed-out event handlers (cb_timeout fired), which is also strange, because these events were supposed to be one-shot (c-ares async DNS queries).

After running for 1hr, it keeps number of sockets (and handles) at very same, stable level and can see that whole communication works fine and logs are delivered how they should be delivered. When there is a new request to create a socketpair - it creates it and destroys when done, so socket count goes up and down.

02/03/2023 01:01:22 [fluent-bit] PID: 17544, Handles: 142, Sockets: 0, Uptime: 0d 0h 0m 13s
02/03/2023 01:01:22 [fluent-bit] PID: 17544, Handles: 245, Sockets: 51, Uptime: 0d 0h 0m 13s
02/03/2023 01:01:23 [fluent-bit] PID: 17544, Handles: 292, Sockets: 87, Uptime: 0d 0h 0m 14s
02/03/2023 01:01:24 [fluent-bit] PID: 17544, Handles: 323, Sockets: 111, Uptime: 0d 0h 0m 14s
02/03/2023 01:01:24 [fluent-bit] PID: 17544, Handles: 348, Sockets: 129, Uptime: 0d 0h 0m 15s
02/03/2023 01:01:25 [fluent-bit] PID: 17544, Handles: 388, Sockets: 183, Uptime: 0d 0h 0m 16s
02/03/2023 01:01:25 [fluent-bit] PID: 17544, Handles: 417, Sockets: 225, Uptime: 0d 0h 0m 16s
02/03/2023 01:01:26 [fluent-bit] PID: 17544, Handles: 444, Sockets: 267, Uptime: 0d 0h 0m 17s
02/03/2023 01:01:27 [fluent-bit] PID: 17544, Handles: 474, Sockets: 312, Uptime: 0d 0h 0m 18s
02/03/2023 01:01:28 [fluent-bit] PID: 17544, Handles: 492, Sockets: 339, Uptime: 0d 0h 0m 18s
02/03/2023 01:01:28 [fluent-bit] PID: 17544, Handles: 492, Sockets: 339, Uptime: 0d 0h 0m 19s
02/03/2023 01:01:29 [fluent-bit] PID: 17544, Handles: 492, Sockets: 339, Uptime: 0d 0h 0m 20s
02/03/2023 01:01:30 [fluent-bit] PID: 17544, Handles: 492, Sockets: 339, Uptime: 0d 0h 0m 21s
(...)
02/03/2023 02:01:57 [fluent-bit] PID: 17544, Handles: 501, Sockets: 339, Uptime: 0d 1h 0m 47s
02/03/2023 02:01:57 [fluent-bit] PID: 17544, Handles: 501, Sockets: 339, Uptime: 0d 1h 0m 48s
02/03/2023 02:01:58 [fluent-bit] PID: 17544, Handles: 501, Sockets: 339, Uptime: 0d 1h 0m 49s
02/03/2023 02:01:59 [fluent-bit] PID: 17544, Handles: 501, Sockets: 339, Uptime: 0d 1h 0m 50s

diff --git a/lib/monkey/mk_core/mk_event_libevent.c b/lib/monkey/mk_core/mk_event_libevent.c
index dc47f4371..530d40dd8 100644
--- a/lib/monkey/mk_core/mk_event_libevent.c
+++ b/lib/monkey/mk_core/mk_event_libevent.c
@@ -188,6 +188,13 @@ static inline int _mk_event_add(struct mk_event_ctx *ctx, evutil_socket_t fd,
         flags |= EV_WRITE;
     }

+    if (event->data != NULL) {
+        struct ev_map *in_ev_map = event->data;
+
+        ev_map->pipe[0] = in_ev_map->pipe[0];
+        ev_map->pipe[1] = in_ev_map->pipe[1];
+    }
+
     /* Compose context */
     event->fd   = fd;
     event->type = type;
@@ -323,8 +323,9 @@ static inline int _mk_event_timeout_create(struct mk_event_ctx *ctx,
     event->fd = fd[0];
     event->type = MK_EVENT_NOTIFICATION;
     event->mask = MK_EVENT_EMPTY;
+    event->data = ev_map;

-    _mk_event_add(ctx, fd[0], MK_EVENT_NOTIFICATION, MK_EVENT_READ, data);
+    _mk_event_add(ctx, fd[0], MK_EVENT_NOTIFICATION, MK_EVENT_READ, event);
     event->mask = MK_EVENT_READ;

     return fd[0];

I also have bunch of other buf/handle/mem leak fixes that will probably add as a PR anyways.

Could somebody from project owners comment on this one?

It seems the c-ares DNS async query (flb_net_getaddrinfo) is never going to destroy event completely, thus never destroys the pipe and event object itself, unless I am missing something... I can provide some extra debug logs added in useful places with object pointers, fds, etc.

jlebaugh commented 1 year ago

Good news, indeed! @danielodievich and I will be happy to do some additional testing with the original scenarios we encountered this with, whenever ready.

MrTomasz commented 1 year ago

Actually, I have another much easier workaround - disable async DNS query at all... also works well and keeps socket/handle level stable...

diff --git a/src/flb_network.c b/src/flb_network.c
index f3a0fd2a4..afac72171 100644
--- a/src/flb_network.c
+++ b/src/flb_network.c
@@ -1193,7 +1193,7 @@ flb_sockfd_t flb_net_tcp_connect(const char *host, unsigned long port,
                                  struct flb_connection *u_conn)
 {
     int ret;
-    int use_async_dns;
+    int use_async_dns = 0;
     char resolver_initial;
     flb_sockfd_t fd = -1;
     char _port[6];
@@ -1216,7 +1216,9 @@ flb_sockfd_t flb_net_tcp_connect(const char *host, unsigned long port,
     /* fomart the TCP port */
     snprintf(_port, sizeof(_port), "%lu", port);

+#if 0
     use_async_dns = is_async;
+#endif

     if (u_conn->net->dns_resolver != NULL) {
         resolver_initial = toupper(u_conn->net->dns_resolver[0]);

I think that's fair workaround until async DNS / event internals will be resolved.

MrTomasz commented 1 year ago

If somebody wants to play with experimental changes -- please find them on my tmp branch https://github.com/MrTomasz/fluent-bit/commits/mr.t/tmp-2.0.9rc

Please note that PR #6782 does not fix handle/socket leak.

MrTomasz commented 1 year ago

Using sync-DNS workaround:

02/03/2023 11:28:52 [fluent-bit] PID: 22916, Handles: 499, Sockets: 339, Uptime: 0d 8h 43m 3s

no more leaks and memory usage is at stable level.

danielodievich commented 1 year ago

I cloned https://github.com/MrTomasz/fluent-bit/commits/mr.t/tmp-2.0.9rc branch, compiled it and ran on my Windows Server 2016 environment. I confirm that the handles no longer leak.

MrTomasz commented 1 year ago

Happy to hear that. Maybe as an interim solution, a config option to select async/sync mode for DNS queries could be an option to integrate, before correct fix for async mode will be made @edsiper ?

On my end -- still works very well and memory consumption + socket/handle count is at stable level:

leonardo-albertovich commented 1 year ago

That option exists, it's not intended to be widely used but if you set net.dns.resolver to legacy fluent-bit will not issue asynchronous dns lookups.

Another way to verify if the asynchronous dns client is part of the issue would be setting net.dns.mode to TCP and verifying that fluent-bit is indeed contacting the DNS server using TCP because in that case timers are not used.

I haven't been able to look into this issue but what you discovered sounds very interesting. The async dns client uses a timer when the lookup is performed through UDP in order to be able to enforce timeouts (you can check that in flb_net_getaddrinfo) and those timers (provided by flb_sched_timer_cb_create) use mk_event_timeout_create which creates a socket pair (the read end is added to the event loop) and a libevent timer which invokes a callback that writes to the socketpair causing the event loop to detect the activity but as I was checking the source for reference I noticed something I didn't see before :

When _mk_event_timeout_create creates a timer it allocates an ev_map structure and saves both sockets in the pipe entries. It also sets the read side of the socket pair as the events "file descriptor".

When the timer callback is invoked it closes its side of the socketpair and releases the ev_map instance.

When _mk_event_del is called it tries to check the ev_map pipe values to determine if it needs to close them but that is a different instance and it doesn't have those values.

And there is no code in the scheduler to close the file descriptor (because it should not care about it since it should be abstracted by mk_event_timeout_destroy.

So I'm thinking that if the timer ticks (timeout detected) the write end should be closed but the read end wouldn't and if the timer does not tick neither of them would be closed.

So since _mk_event_add adds the ev_map instance to the event here what we might want to do is in _mk_event_timeout_create right after the event is added we might want to fill both pipe values and I think this will probably mean we will want to remove the code that closes the socket in cb_timeout but I could be wrong since this is what I just realized while reading the code with a fresh pair of eyes and not something I verified through debugging.

Please take this with a grain of salt. I hope this helps and please feel free to ping me here or on slack, I think I'll be able to make time for this rather soon.

Thanks a lot @MrTomasz, your insight was really helpful.

danielodievich commented 1 year ago

Any updates on this issue's status? Will it be fixed in the future versions of fluent-bit @leonardo-albertovich ?

leonardo-albertovich commented 1 year ago

Oh I'm so glad you brought this up, I've tried to look for this issue but couldn't find it.

There are no updates at the moment, the only workaround is setting the dns resolver to legacy mode but it will be addressed, however, I don't have an ETA since I am deep into something else and can't afford to switch contexts without risking messing up either of them and that's the last thing I'd like to do.

Rest assured that this is high on my personal priority list but as I said, sadly I can't get to it until I finish what I'm focusing on.

braydonk commented 1 year ago

I am working on submitting a fix for this issue. If any maintainers would like to assign this issue to me feel free.

edsiper commented 1 year ago

@braydonk assigned

braydonk commented 1 year ago

The fix provided in Monkey should be available in master and 2.0 branches now, meaning I assume they will be part of 2.1 and 2.0.12 (a maintainer should verify that). I verified my fix using the setup suggested in the first comment on this issue, and it stopped leaking handles.

For those interested in the technical details, I provided the step-by-step of the handle leak in the mentioned monkey issue.

leonardo-albertovich commented 1 year ago

The fix will be released in the versions @braydonk mentioned.

hpernu commented 1 year ago

Is this part of some release distribution?

braydonk commented 1 year ago

It's in 2.1 that got released a few hours ago, and it will be in 2.0.12 whenever that gets released.

fluent / fluent-bit

\Device\Afd file handles leak on Windows associated with in-process diagnostic logging #6748

Bug Report