fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.9k stars 1.59k forks source link

[Windows Server 2016] Memory Leak when using TLS #6497

Open cbi-at-varian opened 1 year ago

cbi-at-varian commented 1 year ago

Bug Report

Describe the bug Our Windows Servers getting unresponsive after a while due to high memory consumption in the Non-Paged Memory Pool. To narrow down the issue only a dummy input and a forward output using TLS is needed. I run one instance without tls and one with tls=on to compare it's memory consumption.

To Reproduce

Expected behavior The Memory consumption should not grow when tls enabled.

Screenshots image

Your Environment

Additional context The leak might got introduced in version 2.x, it is not present in 1.9.10 We found the leak initially in 2.0.5, the provided improvements in the TLS part of 2.0.6 made it actually a bit better.

Let me know if you need more infos

leonardo-albertovich commented 1 year ago

Hi @cbi-at-varian, I am trying to reproduce your issue but don't see a climb in NP Pool memory usage. Let me know if there is anything you think could help. I'll leave the service running for a while just in case I'm missing something.

cbi-at-varian commented 1 year ago

Hi @leonardo-albertovich , thanks for checking this out! On what environment are you running it? For me it was happening only when connecting to a TLS enabled endpoint. We're using TLSv1.3 not sure if that makes a difference

leonardo-albertovich commented 1 year ago

I'm running fluent-bit 2.0.6 on a windows server 2019 virtual machine using the forward output plugin with TLS enabled, without ACKs and sending data to a remote endpoint that's the openssl s_server tool (ie. openssl s_server -cert self_signed.crt -key self_signed.key -port 9999) using a self signed certificate.

cbi-at-varian commented 1 year ago

I run now the example config from my own box (Windows 10) against the same endpoint and can't see a climb in NP Pool. So it looks to me that this is only a windows server 2016 issue.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.