fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.85k stars 1.58k forks source link

fluent-bit pod having difficulty connecting to Splunk HEC endpoint #9398

Open lifayt opened 1 month ago

lifayt commented 1 month ago

Bug Report

Describe the bug We are attempting to add a splunk output to our fluent-bit pods that run as part of an EKS Amazon Cloudwatch addon. We are running into an issue where we are able to manually connect to the HEC endpoint via a curl command like so:

curl --request POST \
  --url https://example.splunkcloud.com/services/collector \
  --header 'Authorization: Splunk <hec-token>' \
  --header 'Content-Type: application/json' \
  --data '{"index": "airflow", "event": "from-fluent-bit-pod"}'

This produces the expected response:

{"text":"Success","code":0}

Similarly, querying the HEC health endpoint works:

curl --request GET \
  --url https://example.splunkcloud.com/services/collector/health 

This also produces the expected response:

{"text":"HEC is healthy","code":17}

but if we try it using the fluent-bit cli, or using a config file, then we get an error saying the domain is not found.

[net] getaddrinfo(host='https://example.splunkcloud.com/services/collector', err=4): Domain name not found

Here are some example commands of how I'm starting up fluent-bit:

/fluent-bit/bin/fluent-bit -i cpu -t cpu -o splunk -p host=https://example.splunkcloud.com/services/collector -p splunk_token=<token> \
  -p tls=on -p tls.verify=off -m '*'

Expected behavior Since I can connect to the splunk ingestion endpoint using curl, I would expect fluent-bit to also be able to connect.

Your Environment

It would be particularly helpful if I could get some feedback on how to better diagnose what the issue is here. I work in a corporate environment, so there's always lots of networking/firewall issues to contend with, but I'm not sure how to get at the guts of what fluent-bit is running into (since my attempts at debugging it by posting events manually to splunk are all working).

patrick-stephens commented 1 month ago

1.9 is a very old version, can you retry with the latest version as there have been a lot of improvements and changes since?

lifayt commented 1 month ago

Hey Patrick, thank you for the suggestion. Unfortunately we seem to be stuck in a slightly awkward position here because (at the moment) we're limited to the fluent-bit version that's shipped with the amazon-cloudwatch-observability eks addon. That's currently on 2.32.2, which ships the following:

2.32.1
This release includes:
Fluent Bit [1.9.10](https://github.com/fluent/fluent-bit/tree/v1.9.10)
Amazon CloudWatch Logs for Fluent Bit 1.9.4
Amazon Kinesis Streams for Fluent Bit 1.10.2
Amazon Kinesis Firehose for Fluent Bit 1.7.2

We're likely going to investigate adding our own fluent-bit pods in this case, but it would be nice if we could get some guidelines on debugging this issue with 1.9.10 in the meantime, if possible.

Thank you! Linus

patrick-stephens commented 1 month ago

I think you probably want to ask via the actual AWS repo for this then, there's an open issue on upgrading that too: https://github.com/aws/aws-for-fluent-bit/issues/494