aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.12k stars 578 forks source link

ec2-metadata-service errors in up to date AWS EKS cluster using Pod Identity #6667

Open shaftoe opened 4 hours ago

shaftoe commented 4 hours ago

Checkboxes for prior research

Describe the bug

Using latest version of https://www.npmjs.com/package/@aws-sdk/ec2-metadata-service seems to not work out of the box with NodeJS v18 in an AWS EKS kubernetes cluster running pod with service account associated and valid policy attached.

Regression Issue

SDK version number

@aws-sdk/ec2-metadata-service@3.693.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

node v18.20.4

Reproduction Steps

# testing the Pod with latest awscli
root@nodetest:/tmp# /usr/local/bin/aws --version
aws-cli/2.21.2 Python/3.12.6 Linux/6.1.112-124.190.amzn2023.x86_64 exe/x86_64.debian.12
root@nodetest:/tmp# /usr/local/bin/aws sts get-caller-identity
{
    "UserId": "xxxxx:eks-app-dev-nodetest-23833230-c77c-4398-95a9-c03cc43bf1a7",
    "Account": "xxxxx",
    "Arn": "arn:aws:sts::xxxxx:assumed-role/eks-app-dev-app-multimediaworker/eks-app-dev-nodetest-23833230-c77c-4398-95a9-c03cc43bf1a7"
}
root@nodetest:/tmp# /usr/local/bin/aws secretsmanager get-secret-value --secret-id xxxxx --output text > output # Works too

Trying getting metadata info via JS module:

root@nodetest:/tmp# npm install @aws-sdk/ec2-metadata-service

added 19 packages, and audited 20 packages in 2s

found 0 vulnerabilities

root@nodetest:/tmp# cat test.js 
const main = async () => {
    const { MetadataService } = require("@aws-sdk/ec2-metadata-service");

    const metadataService = new MetadataService({});
    const metadata = await metadataService.request("/latest/meta-data/", {});

    console.log(metadata);
}

main();

root@nodetest:/tmp# node test.js 
/tmp/node_modules/@aws-sdk/ec2-metadata-service/dist-cjs/index.js:112
      throw new Error(`Error making request to the metadata service: ${error}`);
            ^

Error: Error making request to the metadata service: Error: Request failed with status code 401
    at _MetadataService.request (/tmp/node_modules/@aws-sdk/ec2-metadata-service/dist-cjs/index.js:112:13)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async main (/tmp/test.js:5:22)

Node.js v18.20.4

Observed Behavior

Error when trying to fetch metadata

Expected Behavior

Metadata fetched correctly

Possible Solution

No response

Additional Information/Context

No response

RanVaknin commented 3 hours ago

Hi @shaftoe ,

Your comparison between the CLI and the SDK is not the same.

In the CLI you are not specifying any specific method of credentials and letting the CLI's default credential chain resolve your creds for you.

In the SDK you are using a specific client which is EC2 IMDS-specific. I don't think that functionality extends to the container metadata service which is different (IMDS endpoint and container metadata endpoint have different IP addresses).

This begs the question, what are you trying to do? If you are just trying to use the SDK on an EKS pod, you don't need to use any of this. The default credential chain will be able to fetch credentials from the container metadata endpoint automatically if correctly configured.

If your pod gets injected with the relevant env variables on start time, the SDK will hook into those and make that request to the container metadata service on your behalf. See SDK docs for more info.

Thanks, Ran~

shaftoe commented 2 hours ago

Thanks for the detailed explanation @RanVaknin.

This begs the question, what are you trying to do?

I am (or better, the application I'm trying to fix) trying to "getting AnnouncedIp from ec2 meta data api" (as from inline comments), or better, to retrieve the public-ipv4 address associated to the pod via the metadata API. It's been working fine so far using https://www.npmjs.com/package/node-ec2-metadata but it appears that running the same application in a new cluster doesn't anymore, it also seemed that @aws-sdk/ec2-metadata-service was meant exactly for that.

I suppose at this point the question is: what's the right way to use @aws-sdk/ec2-metadata-service in an EKS environment? or, is it another recommended way to access such metadata?

PS AWS env vars like AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE and AWS_CONTAINER_CREDENTIALS_FULL_URI are populated as expected

RanVaknin commented 2 hours ago

Hi @shaftoe ,

Thanks for the clarification. Can you ssh into your pod and log all the available env variables you have there, and do the same for the previous working cluster's pod to see if there are any discrepancies between the two? If you can share those with us (redact any sensitive info), that might be helpful.

Also, I haven't tested this, but maybe this would work?

const metadataService = new MetadataService({
  endpoint: "http://169.254.170.2",
  disableFetchToken: true 
});

Thanks, Ran~

shaftoe commented 1 hour ago

Of course, thanks a ton for the quick help.

So, env vars (which don't seem to contain anything to be redacted):

root@nodetest:/# export
declare -x AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE="/var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token"
declare -x AWS_CONTAINER_CREDENTIALS_FULL_URI="http://169.254.170.23/v1/credentials"
declare -x AWS_DEFAULT_REGION="us-west-2"
declare -x AWS_REGION="us-west-2"
declare -x AWS_STS_REGIONAL_ENDPOINTS="regional"
declare -x HOME="/root"
declare -x HOSTNAME="nodetest"
declare -x KUBERNETES_PORT="tcp://10.31.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.31.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.31.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.31.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x NODE_VERSION="18.20.4"
declare -x OLDPWD="/"
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/tmp"
declare -x SHLVL="1"
declare -x TERM="xterm"
declare -x YARN_VERSION="1.22.19"

I've also tested the code change suggestion but it seem to hang (without short timeout, waited for 1 minute or so...)

PS out of curiosity I've tried with endpoint: http://169.254.170.23 and it fails with status code 301

shaftoe commented 1 hour ago

Old pod env vars differs slightly (redacted):

declare -x AWS_DEFAULT_REGION="us-west-2"
declare -x AWS_REGION="us-west-2"
declare -x AWS_ROLE_ARN="arn:aws:iam::xxxxxx:role/eks-main-dev-app-xxxxxxx"
declare -x AWS_STS_REGIONAL_ENDPOINTS="regional"
declare -x AWS_WEB_IDENTITY_TOKEN_FILE="/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
declare -x HOME="/root"
declare -x HOSTNAME="ip-xxx.us-west-2.compute.internal"
declare -x HTTP_LISTEN_PORT="4443"
declare -x INTERACTIVE="0"
declare -x KUBERNETES_PORT="tcp://10.31.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.31.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.31.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.31.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x NODE_VERSION="18.20.4"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/app"
declare -x SHLVL="1"
declare -x TERM="xterm"
declare -x TLS_CERT_SECRET_ID="platocorp.com"
declare -x YARN_VERSION="1.22.19"
shaftoe commented 1 hour ago

Request http://169.254.170.2 did timeout eventually:

Error: Error making request to the metadata service: TimeoutError: connect ETIMEDOUT 169.254.170.2:80
RanVaknin commented 1 hour ago

Hi @shaftoe ,

Thanks for the info. The main difference I see between the two clusters is that your older cluster is using IRSA which is the newer more secure way of authenticating with EKS.

Admittedly, I'm lightyears away from being an EKS expert, and my knowledge is really based on debugging these type of issues with customers, so please bear with me while I'm trying to understand your setup.

PS out of curiosity I've tried with endpoint: http://169.254.170.23 and it fails with status code 301

That is interesting. Based on this issue, it might be able to resolve if you add a trailing slash - http://169.254.170.23/

If you are just trying to hit the container metadata endpoint, you shouldn't need to use the SDK. In theory you can just ssh into your pod, and make a curl request to the endpoint to get that metadata.

If it were me debugging my own environment, I will just try all of the following and see if one of them sticks:

# Basic endpoint probing
curl http://169.254.170.2/
curl http://169.254.170.23/

# IMDS v2 endpoints (EC2 metadata service)
curl http://169.254.169.254/latest/meta-data/
curl http://169.254.169.254/latest/meta-data/public-ipv4
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/

# Pod Identity endpoints 
curl http://169.254.170.23/v1/credentials
TOKEN=$(cat $AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE)
curl -H "Authorization: $TOKEN" http://169.254.170.23/v1/credentials

Thanks again, Ran~

shaftoe commented 1 hour ago

The main difference I see between the two clusters is that your older cluster is using IRSA which is the newer more secure way of authenticating with EKS.

This is actually funny, we're currently setting up a new EKS cluster following all found recommendations and EKS pod identity seems to be "the new way" (so supposedly "the correct way" too, right?) for interacting with IAM, see the announcement blog post if you're curious.

That is interesting. Based on https://github.com/awslabs/aws-sdk-rust/issues/560, it might be able to resolve if you add a trailing slash - http://169.254.170.23/

Fails with 301, but

TOKEN=$(cat $AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE)
curl -H "Authorization: $TOKEN" http://169.254.170.23/v1/credentials

works and shows the tokens as expected.

I agree that I don't need to make use of the SDK if all that's needed is to parse some HTTP response (probably in JSON format), the question is where to find the documentation for the API exposed by http://169.254.170.23/, so far the /v1/credentials is the only one that I was able to hit without getting a 404. I'm trying the various combinations of /meta-data/, /latest/meta-data/, and so on so far with no luck. The initial idea though was that using the SDK should shield us from possible future changes in the APIs, frankly all these different IP addresses look a lot like magic numbers...