Open bradley-carrion opened 3 months ago
Two questions here to clarify the specific code-segments that are involved:
Since ~2 hours, this is broken on latest
too.
@swapneils Apologies for the delayed response.
Two questions here to clarify the specific code-segments that are involved:
- So upgrading to build aws-for-fluent-bit with 3.1.4 prevented this issue from occurring?
No, we completely dropped the aws-for-fluent-bit image and are purely using the standard fluent-bit 3.1.4 image.
- Which output plugin are you using here?
We are using the Azure Blob plugin
Since ~2 hours, this is broken on latest too.
@guidoiaquinti Are you saying you tested this case ~2 hours ago, or that this case was previously working for you and is now failing with the latest
tag?
In the latter case, is the public.ecr.aws/aws-observability/aws-for-fluent-bit:init-debug-2.32.2.20240820
image working without issues? The latest release shouldn't be exhibiting different behavior from stable
since we didn't change any fluent-bit code.
Maybe this is completely unrelated, and to be honest, I'm not sure what has changed (I'm currently on mobile with limited connectivity), but all our deployments started failing approximately two hours ago with the following errors:
[2024/10/07 20:17:15] [error] [plugins/out_datadog/datadog.c:184 errno=25] Inappropriate ioctl for device
[2024/10/07 20:17:15] [error] [src/flb_sds.c:109 errno=12] Cannot allocate memory
The timeframe aligns with the update of the latest
tag. Reverting to stable
fixes it. While not strictly related to this GitHub issue, I arrived here because the bug above seems to be occurring in the same Fluent Bit version of the report.
Maybe this is completely unrelated, and to be honest, I'm not sure what has changed (I'm currently on mobile with limited connectivity), but all our deployments started failing approximately two hours ago with the following errors:
[2024/10/07 20:17:15] [error] [plugins/out_datadog/datadog.c:184 errno=25] Inappropriate ioctl for device [2024/10/07 20:17:15] [error] [src/flb_sds.c:109 errno=12] Cannot allocate memory
The timeframe aligns with the update of the
latest
tag. Reverting tostable
fixes it. While not strictly related to this GitHub issue, I arrived here because the bug above seems to be occurring in the same Fluent Bit version of the report.
This seems unrelated seeing as my issue is not exclusively on the new latest
, did not see the error message you're referring to and they haven't upgraded the underlying fluent bit version from 1.9.10 - which is the compatibility issue I'm calling out here. I'd recommend always using the stable
version and creating a new issue for what you're seeing @guidoiaquinti
Thanks Bradley (and sorry for this additional ping :) )
@guidoiaquinti After making the new Issue, could you pin to 2.32.2.20240820
for the moment and email me an AWS Account ID at swapneis@amazon.com?
The first point is because we plan to update our stable image later this week unless we see issues in stability testing (which I don't expect). Delaying the update further without a clear availability risk would harm other customers' workflows (e.g. security scanning), but I also don't want to break yours.
The account ID is so I can share test aws-for-fluent-bit images with you to facilitate investigation.
Fluent Bit Log Output
We have enabled debug logs and nothing in the logs indicate that the CPU should be having issues.
Fluent Bit Version Info
amazon/aws-for-fluent-bit:2.32.2 which uses v1.9.10 of fluent bit under the hood.
Cluster Details
We're running ECS Fargate w/ sidecar deployment of aws-for-fluent-bit.
(This repros locally btw)
Application Details
I was able to repro this locally with the following throughput:
Steps to reproduce issue
Related Issues
No related issues but a suspect fix is in https://github.com/fluent/fluent-bit/pull/5918
My suggestion would be to consider upgrading to the latest fluent bit version.