DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.89k stars 1.21k forks source link

Agent ec2_tags.go appears hardcoded to use instance profile (not task IAM roles) #6171

Closed tobypinder closed 3 years ago

tobypinder commented 4 years ago

We are migrating our ECS infrastructure to exclusively use Task IAM Roles as a means to improve security posture/least priv etc. We noticed the following unexpected behaviour

Describe what happened:

The datadog-agent utilised the instance profile of the ECS Instance and the agent emitted this on startup

| CORE | WARN | (pkg/util/ec2/ec2_tags.go:90 in GetTags) | unable to get tags from aws and cache is empty: UnauthorizedOperation: You are not authorized to perform this operation.

Describe what you expected: The datadog agent would use the priority order of credential usage established in official SDKs, which will fetch credentials from 169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI first should this be available to the container/environment. As such no error would be thrown, and our Cloudtrail logs would show use of the Task IAM Role

Steps to reproduce the issue: Deploy an agent to an ECS cluster (in this particular case as an EC2 DAEMON service) with logging on, and a Task IAM role configuration for the container. Observe agent logs and or API usage via Cloudtrail to see that the IAM role remains unused.

Additional details:

From the code in master this appears to be hardcoded and fetched directly, though unfortunately I am unable to suggest changes via a PR due to a lack of Go knowledge.

ogaca-dd commented 4 years ago

@tobypinder ,

Thank you for reporting this feature request. I have created a card in our backlog and I am going to update this issue once I have more information.

gregsymons commented 3 years ago

This is also a problem with the kubernetes integration, since however you provide credentials to a pod (kiam, IAM Roles for Service Accounts) it's generally best practice to block access to the instance identity document and IAM credentials to prevent pods from impersonating the node. Looking through the code, it's only using the instance identity document to retrieve the region for the instance. Since this information is also available from the general metadata at http://169.254.169.254/placement/region, it'd be better to pull it from there. Additionally, I think it would be better to use the AWS default credential chain rather than explicitly retrieving the credentials, since the default credential chain can properly handle both ECS Task IAM Roles, as well as all the kubernetes methods that I am aware of for providing IAM credentials to pods.

groodt commented 3 years ago

We are seeing this issue as well. Is there any ETA on a fix for this?

groodt commented 3 years ago

Ping! Any news on fixing this?

olivielpeau commented 3 years ago

This limitation will be addressed by https://github.com/DataDog/datadog-agent/pull/8141

groodt commented 3 years ago

I'm still seeing issues with v7.30.0 https://github.com/DataDog/datadog-agent/issues/9011