Open aydosman opened 3 months ago
just curious, what's the use case where one payload might expand to over 100MB ?
today that's a hard limit, we will need to extend it per component, besides in_forward is being used in other areas in your use case ?
Could it be down to back pressure on the collector side, let me try and prove that. I’ll run some simulations and provide all the related metrics.
in_forward is being used in other areas in your use case ?
Not at this time
@edsiper I experience the same issue with the Fluent Bit version 3.0.4, however using the same configuration with the Fluent Bit version 2.2.2 we don't encounter this error. I believe, although I might be wrong that the error was introduced with this change https://github.com/fluent/fluent-bit/pull/8665
FYI: @cosmo0920
@edsiper has this been investigated?
Hi, I'm trying to add full width confirmation of concatenated gzip stream of forwarded payloads in https://github.com/fluent/fluent-bit/pull/9139. Would you mind if you tried to test that patch?
@cosmo0920 is there an OCI image built as part of the PR?
No. I tried to generate PR specific images. But no luck.
Has this been fixed in v3.1.5?
fb version 3.1.5 – Bug still exists fb version 3.1.6 – Bug still exists
To add and to prove a theory we had, the data we send/persist/DBs from the collector and aggregator might have been somehow corrupted, so these were tested on fresh new cloud nodes.
fb version 3.1.5 – Bug still exists fb version 3.1.6 – Bug still exists
To add and to prove a theory we had, the data we send/persist/DBs from the collector and aggregator might have been somehow corrupted, so these were tested on fresh new cloud nodes.
Do you have a reproducible step?
fb version 3.1.5 – Bug still exists fb version 3.1.6 – Bug still exists To add and to prove a theory we had, the data we send/persist/DBs from the collector and aggregator might have been somehow corrupted, so these were tested on fresh new cloud nodes.
Do you have a reproducible step?
The configuration shown above has not changed only the Fluent Bit container image version has been updated. Let me know if you need anything else.
Any update on this issue?
Bug Report
I'm encountering an issue with Fluent Bit where the gzip decompression fails due to exceeding the maximum decompression size of 100MB. Below are the relevant error logs and configurations for both the collector and aggregator.
To Reproduce
Example log message
Steps to reproduce the problem
Set up Fluent Bit with the provided collector and aggregator configurations.
Monitor the logs for gzip decompression errors.
Expected behavior
Fluent Bit should handle the gzip decompression without exceeding the maximum decompression size limit.
Screenshots
N/A
Your Environment
Version used: Fluent Bit 3.0.7
Configuration:
Collector Configuration:
Aggregator Configuration:
Environment name and version (e.g. Kubernetes? What version?)
Kubernetes 1.30, 1.29, 1.28
Server type and version
AKS/EKS
Operating System and version
Ubuntu, AL2, AL2023 and BottlerocketOS
Filters and plugins
See above
Additional context
This issue persists across all Fluent Bit instances with the same configuration. Both collector and aggregator are using the same Fluent Bit version (3.0.7). The rate of records processed per second is consistently around 800 (so not too much). Any guidance or solution to resolve this issue would be greatly appreciated.