Closed SamuelDudley closed 2 years ago
This exact same issue is affecting us with our fluent bit Kubernetes daemonset. We are using IMDSv2 on our EKS nodes and fluentbit is unable to communicate with our elastic search cluster. As a result, we have to turn off the AWS_Auth parameter.
This should be a high priority as this is a security risk for many users.
Check if you are affected by the hop limit - increase it to 2:
aws-cli ec2 modify-instance-metadata-options --instance-id i-00000000000 --http-put-response-hop-limit 2
IMDSv2 requires a PUT request to initiate a session to the instance metadata service and retrieve a token. By default, the response to PUT requests has a response hop limit (time to live) of 1 at the IP protocol level. However, this limit is incompatible with containerized applications on Kubernetes that run in a separate network namespace from the instance.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Still an issue, nothing to do with the hop limit. The code to handle IMDSv2 simply is not used for obtaining credentials.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Commenting to keep this issue alive as I cant edit / remove labels.
Hi, I try to run Fluent bit on a Windows server 2016, the Cloudwatch plugins seem unable to authenticate using the Instance Profile.
Can we try this with 1.7.x and see if it reproducing?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Sorry folks. This is a feature gap which I had meant to address late last year but then lost it with too many other higher priority feature requests and bugs.
We will get someone to work on this soon.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue is preventing using the S3 output plugin. The current workaround to use IMDSv1 is a security breach. Do you have an ETA to have it resolved?
@shalevutnik Yes, I understand this is very important, but I am stretched very thin lately. Unfortunately I can't give a promise any exact ETA yet but I have gotten someone from my team assigned to start work on this soon.
Hi :wave: I am currently working on adding IMDSv2 support to AWS Fluent Bit plugins. Thank you for your patience. I will update you on the progress of this feature.
Just an update on the progress of IMDSv2 support. The unit tests and code have been written, and the tests are passing. We're going through a couple code reviews. Feel free to take a look at the PR https://github.com/fluent/fluent-bit/pull/4086. Releasing the code may take ~2 weeks. Thank you for your patience and input on the importance of this feature.
Thank you for your help, will the new IMDSv2 support be new Fluent Bit image release, or we can upgrade from IMDSv1 to IMDSv2
Aqubo,you're welcome, and thanks for checking back. We are expecting to include IMDSv2 support in the next Fluent Bit release, 1.8.8. Will keep you updated.
I have the same error here.
I have the same error here.
Hi, try using the version quoted in this comment: https://github.com/aws/aws-for-fluent-bit/issues/207#issuecomment-943694457
Thank you @SamuelDudley . IMDSv2 support is added in Fluent Bit version 1.8.8 and aws-for-fluent-bit v2.21.0. Please see the issue link Samuel copied: https://github.com/aws/aws-for-fluent-bit/issues/207#issuecomment-943694457
@PettitWesley Hi, I'm currently running into this same issue with 1.8.8. I've read through the various threads, but haven't had luck getting IMDS authentication to work. Does anything pop out with the below configuration that may be an issue?
{
"State": "applied",
"HttpTokens": "optional",
"HttpPutResponseHopLimit": 2,
"HttpEndpoint": "enabled",
"HttpProtocolIpv6": "disabled"
}
^I've tried with 'HttpTokens: required' as well
[OUTPUT]
Name s3
Match *
bucket xxxxxxxxx
region us-gov-west-1
use_put_object true
total_file_size 1M
upload_timeout 1m
[2021/11/10 17:02:10] [debug] [upstream] KA connection #180 to s3.us-gov-west-1.amazonaws.com:443 has been assigned (recycled)
[2021/11/10 17:02:10] [debug] [http_client] not using http_proxy for header
[2021/11/10 17:02:10] [debug] [aws_credentials] Requesting credentials from the env provider..
[2021/11/10 17:02:10] [debug] [aws_credentials] Retrieving credentials for AWS Profile default
[2021/11/10 17:02:10] [debug] [aws_credentials] Reading shared config file.
[2021/11/10 17:02:10] [debug] [aws_credentials] Shared config file /fluent-bit/.aws/config does not exist
[2021/11/10 17:02:10] [debug] [aws_credentials] Reading shared credentials file.
[2021/11/10 17:02:10] [error] [aws_credentials] Shared credentials file /fluent-bit/.aws/credentials does not exist
[2021/11/10 17:02:10] [error] [aws_credentials] Failed to retrieve credentials for AWS Profile default
[2021/11/10 17:02:10] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
[2021/11/10 17:02:10] [debug] [aws_credentials] requesting credentials from EC2 IMDS
[2021/11/10 17:02:10] [debug] [upstream] KA connection #178 to 169.254.169.254:80 has been assigned (recycled)
[2021/11/10 17:02:10] [debug] [http_client] not using http_proxy for header
[2021/11/10 17:02:20] [debug] [aws_client] (null): http_do=0, HTTP Status: 503
[2021/11/10 17:02:20] [debug] [upstream] KA connection #178 to 169.254.169.254:80 is now available
[2021/11/10 17:02:20] [ warn] [imds] unable to evaluate IMDS version
[2021/11/10 17:02:20] [ warn] [aws_credentials] No cached credentials are available and a credential refresh is already in progress. The current co-routine will retry.
[2021/11/10 17:02:20] [error] [signv4] Provider returned no credentials, service=s3
[2021/11/10 17:02:20] [error] [aws_client] could not sign request
[2021/11/10 17:02:20] [debug] [upstream] KA connection #180 to s3.us-gov-west-1.amazonaws.com:443 is now available
[2021/11/10 17:02:20] [error] [output:s3:s3.2] PutObject request failed
Are you using kube2iam ?
Are you using kube2iam ?
No currently I'm trying to utilize the IAM role that is attached to the instance it's deployed on for the time being. I recall seeing your post in another thread, have you gotten kube2iam to work with the 1.8.8 image?
Yes. I had to fidget around with the available versions and ended up with the following config to manually choose the deployed docker image:
repositories:
...
- name: kube2iam
url: https://jtblin.github.io/kube2iam/
releases:
- name: kube2iam
namespace: kube-system
chart: kube2iam/kube2iam
version: 2.6.0
values:
- image:
tag: 0.10.11
...
@matthewfala Can you help here
Hi @kdalporto. This is not the hops limit issue any more, since you have hops limits correctly set to 2 at it looks like from your error logs Fluent Bit is not having that problem. It seems like the IMDS may be unreachable. Is it possible for you to try to curl 169.254.169.254 on your instance?
curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"
This should return a token.
@matthewfala yes that returns a ~56 character token when running on the node instance where fluent-bit is running. I'm also able to manually upload objects to the destination bucket via the CLI. I currently have HttpTokens set to required.
That's strange. Your error message should only come up [imds] unable to evaluate IMDS version
if the following request does not complete:
The following curl should return with a status code of 401 which indicates IMDSv2 availability.
curl -H "X-aws-ec2-metadata-token: INVALID" -v http://169.254.169.254/
It's not clear why this request is failing (not returning anything) (401 is expected).
That curl does indeed lead to a 401:
* About to connect() to 169.254.169.254 port 80 (#0)
* Trying 169.254.169.254...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 169.254.169.254
> Accept: */*
> X-aws-ec2-metadata-token: INVALID
>
< HTTP/1.1 401 Unauthorized
< Content-Length: 0
< Date: Wed, 10 Nov 2021 23:38:57 GMT
< Server: EC2ws
< Connection: close
< Content-Type: text/plain
<
* Closing connection 0
@matthewfala, I have a bit of an update. I've realized on two separate occasions that logs have gotten sent to S3, but I wasn't sure why. This morning it realized it had occurred again, as a result of me deleting my kubernetes deployment, the logs were sent to S3. This is consistent with the documentation snippet:
"If Fluent Bit is stopped suddenly it will try to send all data and complete all uploads before it shuts down."
At the moment, I don't understand why it seems to be able to send to S3 on shutdown, but fails during normal operations.
Update: I tried to reproduce the above scenario, however no logs were sent on shutdown this time.
I'm not sure what the issue could be. The process of obtaining credentials during shutdown is the same as the process of obtaining credentials during normal operations. That is if the inputs (some of which have network activity) are not interfering with our requests. One thing that might be happening is that the input collectors are shut down, while the output plugins are still sending out logs. If the input plugin that is interfering with our network requests is stopped, then then that might explain why on shut down we are able to reach IMDS and during normal operations we are not. What input plugins are you using? anything that might require networking such as Prometheus?
I have a custom image which adds IMDSv1 fallback support and also some extra debug statements for IMDS problems. If you want to test this out and send the resulting logs, they could help us figure out what the problem is: (if IMDSv2 fails IMDSv1 will be tried)
Here's the image repo and tag -
826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit:1.8.8-imds-fallback-patch
Yes, Prometheus is running in our deployment. I'll try to utilize that image and grab the logs.
Circling back on this... The issue was that the overall Kubernetes deployment repo we use specifically blocks pods from accessing IMDS in the namespace fluent-bit is deployed in, but access is still available from the instance level. I've confirmed running fluent-bit in it's own separate namespace allows fluent-bit to send logs to S3 with IMDS.
@kdalporto Thanks for this post. I had forgotten about that, I believe its recommended in EKS and ECS to block containers from accessing IMDS.
Awesome @kdalporto. I'm glad to hear that this is no longer an issue for you. Thank you for letting us know.
Hi, I'm using Fluent Bit v1.8.15 / aws-for-fluent-bit 2.23.4 on AWS EKS and I'm still getting this in the logs
[2022/04/29 11:16:43] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
I'm using IMDSv2 with the correct hop limit: { "State": "applied", "HttpTokens": "required", "HttpPutResponseHopLimit": 2, "HttpEndpoint": "enabled", "HttpProtocolIpv6": "disabled", "InstanceMetadataTags": "disabled" }
curl -H "X-aws-ec2-metadata-token: INVALID" -v http://169.254.169.254/ is reporting 401 curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" returns a token
Sending logs to Cloudwatch does work though (at least for now). So I'm not sure if this is an error message which refers to IMDSv1 while IMDSv2 is working fine.
@Ahlaee Is there more log output than that?
CC @matthewfala
@PettitWesley Everything else looks ok:
Fluent Bit v1.8.15
[2022/04/29 10:33:43] [ info] [engine] started (pid=1) [2022/04/29 10:33:43] [ info] [storage] version=1.1.6, initializing... [2022/04/29 10:33:43] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/' [2022/04/29 10:33:43] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128 [2022/04/29 10:33:43] [ info] [storage] backlog input plugin: storage_backlog.8 [2022/04/29 10:33:43] [ info] [cmetrics] version=0.2.2 [2022/04/29 10:33:43] [ info] [input:systemd:systemd.3] seek_cursor=s=bfc76bb2c6464c94b13827824290ea6a;i=14f... OK [2022/04/29 10:33:43] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M [2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443 [2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] local POD info OK [2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server... [2022/04/29 10:33:43] [ info] [filter:kubernetes:kubernetes.0] connectivity OK [2022/04/29 10:33:43] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS on initialization [2022/04/29 10:33:43] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS on initialization [2022/04/29 10:33:43] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020 [2022/04/29 10:33:43] [ info] [sp] stream processor started
After that it creates the Log Streams. And then it repeats indefinitely:
[2022/04/29 20:16:46] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
Logs are forwarded to cloudwatch nonetheless.
@Ahlaee Ah this is the EC2 filter... and I think I might know the problem, you might have IMDS blocked for containers- this is a common/best practice. Does your setup include any of this? https://aws.amazon.com/premiumsupport/knowledge-center/ecs-container-ec2-metadata/
@PettitWesley No, our setup runs on EKS not ECS. I never configured anything related to networking modes when spinning up the cluster using the console. As far as I understand from the linked article, having IMDS blocked is an intentional setting that must be included in the user data of the Amazon EC2 instance. I didn't include anything related to this. It might be implicitly included by AWS in the cluster creation process.
@Ahlaee Hmm you're right, this looks like the right link for EKS IMDS related things: https://github.com/aws/containers-roadmap/issues/1109
After that it creates the Log Streams. And then it repeats indefinitely:
[2022/04/29 20:16:46] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
Logs are forwarded to cloudwatch nonetheless.
Yea so the filter is failing, creds must be succeeding. Can you please share your full config?
Also since you have IMDSv2 required (tokens required), then you need to set the config in the AWS filter: https://docs.fluentbit.io/manual/pipeline/filters/aws-metadata
[FILTER]
Name aws
Match *
imds_version v2
I was following the AWS documentation when setting up fluent-bit for EKS: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html
Their fluent-bit.yaml which is linked under
contains an older image of the software, that doesn't support IMDSv2 and also has the imds_version filter set to v1.
Setting the image version to 2.23.4 and the filter to imds_version v2 as you described above solved the issue for me. :)
Thank you!
I concur with @Ahlaee , using EKS with AWS supplied docs for setting up fluent-bit to Cloudwatch, also setting the image to 2.23.4 and imds_version v2 solved the issue for me aswell
Just setting imds_version v2
fixed this for me. FWIW, it looks like the current stable version is 2.23.3
.
Seems like for me just helped changed imds_version
to v2
In Oct 2022 the container image version in this manifest were new enough for IMDS v2 but configuration still contained 'imds_version v1' in two places. Updating 'v1' to 'v2' (in two places) was enough to fix that.
FWIW, I just re-deployed fluent-bit public.ecr.aws/aws-observability/aws-for-fluent-bit@sha256:ff702d8e4a0a9c34d933ce41436e570eb340f56a08a2bc57b2d052350bfbc05d
and started receiving the error [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
. I changed the value for imds_version
to v2
in both spots in the ConfigMap (and restarted the DaemonSet) and am still seeing the error.
https://repost.aws/knowledge-center/ecs-container-ec2-metadata
Hop limit of 2 is required when using Docker/containers. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ec2/modify-instance-metadata-options.html
Bug Report
Describe the bug
Credentials are not retrieved from AWS Instance Metadata Service v2 (IMDSv2) when running on EC2. This causes plugins that require credentials to fail (e.g.:
cloudwatch
).To Reproduce
Steps to reproduce the problem:
Create an EC2 instance with metadata version 2 only selected on the
Advanced Details
section of theConfigure Instance
step. NB: I have used Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-09f765d333a8ebb4b (64-bit x86) in this exampleAs I will be using the
cloudwatch
output to demonstrate this issue I have assigned a very loose role to the instance:I created and assigned fully open security group to remove that as a potential issue.
Install Fluent Bit as per https://docs.fluentbit.io/manual/installation/linux/amazon-linux
Apply the following configuration:
[INPUT] Name systemd Path /var/log/journal Buffer_Chunk_Size 32000 Buffer_Max_Size 64000
[OUTPUT] Name cloudwatch_logs Match * region ap-southeast-2 log_group_name testing log_stream_name bazz auto_create_group true